Vega-lite + Altair

Visualizing Bigfoot Sightings

2023.04.10


This week, I’ll be making some modifications to the visualizations I made as part of the last homework assignment. Check out the original visualizations here!

Data Cleaning

Before getting started, the first step is to clean the data:

# Import the dataset, get only the columns we need, and drop any NaNs
URL = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/bfro_reports_fall2022.csv"
df = pd.read_csv(URL)

df = df[['state', 'latitude', 'longitude', 'classification', 'season']].dropna()

# drop any 0s in the lat and long columns
df = df[(df['latitude'] != 0) & (df['longitude'] != 0)]

df.head()

Here, I collected only the columns I needed for visualization and dropped NA values. I also dropped 0s in the only quantitative columns I used, since they are used in place of NA values in this dataset. The data is also linked at the bottom of this page.

This step is exactly the same as it was in Homework 9 since I am plotting the same variables in a slightly different way.

Plot 1

This plot shows the location of each reported sighting as well as its classification.

Plot 2

This second plot show the total number of sightings in each state, as well as the recorded season of the report. You can also hover over each season segment on the plot to see the total count for that section.

Linking the Plots

To put the plots together (and make the visualization a little more interesting), I added an altair.selection_interval brush, which can be used to select an area on the first plot. The second plot updates to show the total number of sightings and season just for that region.

Using the selector on this dashboard can show some interesting data. It seems like some sightings were logged incorrectly (there is a report for Washington that’s in the middle of the Pacific Ocean?).

In Conclusion…

Sources

Explore the data on your own here!