Chapter 6: Exploring Your Data#

Now that we have our newly minted dataset, we should explore the data we just created. One way to explore the data is simply to look at the raw data by opening it up in Excel, but this is generally not a good idea. For large datasets, we will want to explore the data by using both aggregations and data visualizations. In this chapter, we will do both.

For both methods of data exploration, we will use Jupyter notebooks. Jupyter notebooks allow us to interactively write Python code, which is useful for both data exploration and data visualizations. Jupyter is both easy to install and easy to use and will be explained in the next section. In addition, we will be using the Python library Pandas to manipulate data and Matplotlib to visualize the data. Helpful cheatsheets for both Pandas and Matplotlib can be here and here.

Installing Jupyter#

On the command line, install Jupyter using pip.

pip install jupyter

Easy as that!

Using Jupyter notebooks#

Jupyter notebooks can be created, viewed, and edited in a couple different ways. Like with normal Python files, you could use an IDE like VSCode or IDLE to run and edit the notebooks, though this is usually not the best way.

The better way is to run the following in the command line:

jupyter notebook

This will open a Jupyter environment in a web browser. Make sure you are in a high level directory, or you may not be able to access all of your files. Usually the browser window will open automatically, but if it does not, simply copy and paste the url from your terminal. It should look something like the following:

http://localhost:8888/?token=###

Once you are in the Jupyter environment, you should see what looks like a directory manager. You can create a new notebook by clicking “new” in the top right corner, and hitting “Python 3.” This will create a brand new notebook you can use to explore data and create data visualizations.