16.1. Cleaning Data¶

16.1.1. Intro to Data Cleaning¶

Read the following articles:

Data Cleaning in Python: the Ultimate Guide.
Data Cleaning for Beginners- Why and How ?.
Guide To Data Cleaning: Definition, Benefits, Components, And How To Clean Your Data.

16.1.2. Check Your Understanding¶

Question

Name the four categories of “dirty” data.

Question

Name the three possible solutions to any “dirty” data problem.

Question

Your data set local_plants_df has the following column names: ['flora_sci_name', 'tall', 'growing_zone', 'avg_rainfall']. We want to rename our tall column to avg_height. What syntax would we use?

Question

You have been tasked to help the local parks department assess visitor usage to a local park over 8 weeks. As you are looking at your data, you notice a row duplication. Why would it be beneficial to this project to delete this duplicated row?

DataFrame showing name of park, location, week of, and number of guests. There are multiple rows with some duplication.

Question

Define “data cleaning”.

Question

The 5 characteristics of quality data include: