16.1. Cleaning Data¶
16.1.1. Intro to Data Cleaning¶
Read the following articles:
16.1.2. Check Your Understanding¶
Question
Name the four categories of “dirty” data.
Question
Name the three possible solutions to any “dirty” data problem.
Question
Your data set local_plants_df
has the following column names: ['flora_sci_name', 'tall',
'growing_zone', 'avg_rainfall']
. We want to rename our tall
column to avg_height
.
What syntax would we use?
Question
You have been tasked to help the local parks department assess visitor usage to a local park over 8 weeks. As you are looking at your data, you notice a row duplication. Why would it be beneficial to this project to delete this duplicated row?
Question
Define “data cleaning”.
Question
The 5 characteristics of quality data include: