14.1. Exploratory Data Analysis

Read the following articles, follow along where instructed.

Note

You do not need to install pandas, it comes with Anaconda.

Tip

For Medium articles: if you run out of free articles, open the page in an incognito window.

14.1.1. Exploring Data with Python

Code along with this article.

  1. Exploratory data analysis in Python.

  2. Stop at Step #8 “Detecting Outliers”.

14.1.2. Get to Know Your Data

  1. Getting to know your data.

  2. Data Types in Statistics.

14.1.3. Python pandas

Code along with this article.

  1. Python Pandas Tutorial: A Complete Introduction for Beginners

  2. Stop at “Handling Duplicates”.

14.1.4. Statistics in pandas

  1. Basic statistics in pandas DataFrame.

14.1.5. What is a DataFrame?

A pandas DataFrame is similar to a Python dictionary. The column names are like keys and the values are the data for that column.

Diagram of a Pandas Dataframe.

For more information about pandas DataFrames and the diagram above, visit w3resource.

The column values are called a pandas Series. Here is how pandas Series are used to build a dataframe.
Diagram of how ``pandas Series``  a dataframe.

For more information about pandas Series and diagram above, visit w3resource.

14.1.6. Check Your Understanding

Question

What is the pandas function used to return the number of rows and columns in a DataFrame?

Question

Column names cannot be changed in a DataFrame?

  1. True

  2. False

Question

What can knowing the data types present in a data set tell us about the data being presented?

Question

What is the pandas method for reading a CSV file type?

Question

Visualized below is the “purchases” DataFrame . What is the pandas syntax to select for Robert’s data?

DataFrame showing name of person and if they purchased apples and/or oranges.

Question

How do we view only the first 13 rows of a DataFrame?

Question

A DataFrame column is a Series?

  1. True

  2. False

Question

Which pandas function will print the number of records, three quartiles, mean, standard deviation, minimum and maximum values of a DataFrame?

  1. .describe()

  2. .index()

  3. .statistics()

  4. .head()