15.1. EDA With Python Part 2

15.1.1. Readings

Read the following articles, follow along where instructed:

  • Key takeaway: Reference list of common Pandas methods for EDA (some are review from last week).
  • Key takeaway: Defines global, contextual, and collective outliers. Article and 5 min video.
  • Note: no need to know the details of the techniques described here for handling missing data. Read to get introduced to some advanced methods of systematically handling missing values

Understand the Data With Univariate And Multivariate Charts and Plots in Python.

  • Read the above article and work along using the notebook and dataset found in this Git Hub repository.
    • This walkthrough will have you write code and answer questions.

15.1.2. Check Your Understanding

Question

The following plot graphs a user’s Spotify recommended song length (in milliseconds) with a song’s energy score (a perceptual measure of intensity and activity between 0 and 1). What, if any, outliers are present?

Plot graph with dots representing length of song and energy score.

Question

The National Park Service records the number of visitors to the Gateway Arch in St Louis every day. Occasionally, concerts are held on the park grounds and the number of visitors soars. On concert days, what type of figures is the NPS seeing?

Question

Do missing values in a dataset provide no analytical use?

  1. True
  2. False

Question

How can data analysts leverage the presence of null values in a data set?

  1. We can create an additional column with a binary type, indicating if any information is missing in the column in question
  2. We can clean the data by removing any missing entries.
  3. We can clean the data by removing any columns with missing entries.
  4. We can’t, Null values serve no purpose in analysis.

Question

Data sets with missing values have been improperly assembled and are rare to encounter in the professional field of data analysis.

  1. True
  2. False

Question

There is no best method for addressing missing data.

  1. True
  2. False