16.1. Cleaning Data¶
Read the following articles, follow along where instructed:
- Key takeaway: More in-depth techniques on what to clean and how.
- Key takeaway: Brief Introduction of why and how to clean your data
- Key takeaway: Theory behind cleaning.
16.1.2. Check Your Understanding¶
Name the four categories of “dirty” data.
Name the three possible solutions to any “dirty” data problem.
Your data set “local_plants_df” has the following column names: [‘flora_sci_name’, ‘tall’, ‘growing_zone’, ‘avg_rainfall’]. We want to rename our ‘tall’ column to ‘avg_height’. What syntax would we use?
You have been tasked to help the local parks department assess visitor usage to a local park over 8 weeks. As you are looking at your data, you notice a row duplication. Why would it be beneficial to this project to delete this duplicated row?
Define “data cleaning”.
The 5 characteristics of quality data include: