Exploring Data with pandas
Exploratory Data Analysis or EDA as you already know is a critical step when beginning your analysis work. Similar to the EDA work with spreadsheets we will do the same with Python and pandas in order to accomplish the following:
- Form a hypotheses about what is the underlying forces effecting your data.
- Challenge previous assumptions that may have been made when discussing the business issue.
- Guide you on what tools and techniques you should use when working with that dataset.
pandas and other libraries like NumPy are not default Python packages. This means that we will need to install and import any external package that we need into our workspace to use their functionality.
Jupyter Notebooks
Before moving forward take a look at this section on Jupyter Notebooks
pandas
The pandas library is incredibly powerful and was built specifically for data analysis work. The library comes with many useful tools and data structures that we will cover more in depth in the upcoming readings.
We will use pandas to create, manipulate, and view data structures based on certain conditions. We will also cover some of the most common functions used when exploring data with pandas that we can use to our advantage during the exploration process.
Remember to install pandas within your virtual environment! Run the following command to activate it from within your data-analysis-projects
directory. Refer back to the section on virtual environments if you would like to revisit the material: Virtual Environments
source venv/bin/activate
To install pandas, you will need to run the following command within your virtual environment:
pip install pandas
When you install pandas it will also install the latest version of Numpy as well. You can check your pandas and numpy versions with the following command:
pip show pandas
pip show numpy
If the above commands do not work, you may need to specify pip3
in the command.
Once pandas is installed, it can be imported into your workspace in the following way:
import pandas as pd
NumPy
The NumPy library will be used in conjuction with pandas so that we can perform mathematical operations on some of our datasets. As we explore our data and in later chapters, begin cleaning and manipulating data we will use the tools it provides to make our life easier.
Once NumPy is installed, it can be imported into your workspace in the following way:
import numpy as np