15.1. EDA With Python Part 2¶

Read the following articles:

15.1.1. `pandas` Function¶

20 Must-Know Pandas Function for Exploratory Data Analysis.

15.1.2. Outlier Detection¶

A Quick Guide to the Different Types of Outliers.

15.1.3. Missing Data¶

15.1.4. Univariate and Multivariate Charts¶

Code along with the article below using the DataCleaning-Heart-Data repo:

Understand the Data With Univariate And Multivariate Charts and Plots in Python.

15.1.5. Check Your Understanding¶

Question

The following plot graphs a user’s Spotify recommended song length (in milliseconds) with a song’s energy score (a perceptual measure of intensity and activity between 0 and 1).

Plot graph with dots representing length of song and energy score.

True or False: This graph contains no outliers.

True
False

Question

The National Park Service records the number of visitors to the Gateway Arch in St Louis every day. Occasionally, concerts are held on the park grounds and the number of visitors soars. On concert days, what type of anomaly is the NPS see in their visitor data?

Global anomaly
Contextual anomaly
Collective anomaly

Question

If a dataset is missing values, it is absolutely useless.

True
False

Question

How can data analysts leverage the presence of null values in a data set?

We can create an additional column with a binary type, indicating if any information is missing in the column in question
We can clean the data by removing any missing entries.
We can clean the data by removing any columns with missing entries.
We can’t, Null values serve no purpose in analysis.

Question

Data sets with missing values have been improperly assembled and are rare to encounter in the professional field of data analysis.

True
False

Question

There is no best method for addressing missing data.

True
False

15.1. EDA With Python Part 2¶

15.1.1. pandas Function¶

15.1.2. Outlier Detection¶

15.1.3. Missing Data¶

15.1.4. Univariate and Multivariate Charts¶

15.1.5. Check Your Understanding¶

15.1.1. `pandas` Function¶