9 to 5
Learning Objectives
By completing this assignment, students will be able to:
Database Exploration: Navigate and understand complex database structures, including table relationships, column data types, and data connections within the Bureau of Labor Statistics employment dataset.
SQL Aggregation Functions: Apply aggregate functions (COUNT, SUM, AVERAGE) to calculate employment statistics and answer business questions using large datasets with thousands of rows.
SQL Joins: Implement different types of SQL joins (LEFT, INNER, OUTER) to combine data from multiple tables and extract meaningful insights from relational databases.
Advanced SQL Techniques: Utilize subqueries, unions, and derived tables to perform complex data analysis and answer sophisticated business questions about employment trends.
Data Analysis Problem-Solving: Translate business requirements from stakeholders (Dolly Parton’s nonprofit research) into actionable SQL queries that provide insights about production/nonsupervisory employees versus all employees.
Business Issue and Understanding
You are working for the Bureau of Labor Statistics with the United States government and have been approached by your boss with an important meeting request. You have been asked by your supervisor to meet with Dolly Parton, whose nonprofit is looking to shed light on the state of employment in the United States. As part of the 9 to 5 project, their research is focused on production and nonsupervisory employees and how those employees fare compared to all employees in the United States. While the data the BLS collects from the CES is publicly available, Dolly Parton and her colleagues need your assistance navigating the thousands of rows in each table in LaborStatisticsDB
.
More on the Dataset
For this assignment, you will be using the Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results which can be found on Kaggle
and are already added to LaborStatisticsDB
for you.
Here are some things you need to know about this dataset:
- The industry table contains an NAICS code. This is different from the industry code. NAICS stands for North American Industry Classification System.
- Series ID is composed of multiple different codes. CES stands for Current Employment Survey, the name of the survey which collected the data. The industry code as specified by the BLS and the data type code as specified in the datatype table.
Getting Started
Download the 9-to-5.ipynb
file included within the canvas assignment, open it within Visual Studio Code and connect to the LaborStatisticsDB
.
Submitting Your Work
When you are finished, make sure to save your file and upload it to the associated Canvas assignment!
Tip
Windows Users: when you are saving your edited file from Visual Studio Code, you may need to select the “Notebook” file type when saving the file to your machine.
Note
The following tasks and questions in the textbook for this assignment are also located within the 9-to-5.ipynb
file. You must save the answers to the questions within your notebook for grading.