14.1. Exploratory Data Analysis¶
Read the following articles, follow along where instructed:
For Medium articles: if you run out of free articles, open the page in an incognito window
- Key Takeaways: Outline of common steps used in EDA, note that there is no one process for performing EDA, it all depends on the dataset and your questions.
- Just read and follow the steps for comprehension, don’t need to do the tutorial.
- Stop at #8: “Detecting Outliers”
- Key Takeaways: Questions to better understand why the data was collected.
- Suggested Reading: Data Types in Statistics.
- Key Takeaways: Definitions for discrete, continuous and categorical data.
- You do not need to install pandas, it comes with the Anaconda package.
- Try coding along with the article
- Stop at “Handling Duplicates” header
- Key takeaway: Using pandas DataFrame; with examples
14.1.2. What is a Dataframe?¶
A Pandas dataframe is similar to a Python dictionary. The column names are like keys and the values are the data for that column. This diagram illustrates the different components of a dataframe.
Credit for the above diagram and for more information about Pandas Dataframes visit here.
Credit for the above diagram and for more information about Pandas Series visit here.
14.1.3. Check Your Understanding¶
What is the pandas function used to return the number of rows and columns in a dataframe?
Column names cannot be changed in dataframes?
What can knowing the data types present in a data set tell us about the data being presented?
What is the Pandas method for reading a csv?
Visualized below is the “purchases” dataframe . What is the pandas syntax to select for Robert’s data?
How do we view only the first 13 rows of a dataframe?
A dataframe column is a series?
Which pandas function will print the number of records, three quartiles, mean, standard deviation, minimum and maximum values of a dataframe?