Databases with Python

Working within a database command line interface can oftentimes make it cumbersome and difficult to execute multiple commands. Because of this, analysts usually prefer to use a user-friendly graphical user interface (GUI) or leverage programming languages like Python and supported libraries to interact with them. We will use Python and the sqlite3 library to complete the following:

Create a new SQLite database
Add a table
Perform CRUD operations on the table

While you can accomplish more than just the above using python and pandas, like performing joins, it is not always best practice. As it relates to joins, database engines are built and optimized to perform joins extremely well. It is always important to know what you will be doing with your data before acting.

Warning

We have created a new repo for Class 19 and 20 exercises and studios. Please fork this repo to your Github account, and then clone it to your local device

Class 19 and 20 Exercise Studio Repo

The examples below can be found at data-analysis-projects-class-19-and-20/class-20/python-db-walkthrough.ipynb

sqlite3 with Python

sqlite3 works in conjunction with python by allowing the user to establish a connection to a file located on your machine. You can then reference the connection variable to begin executing sql commands.

The basic syntax is as follows:

import sqlite3

# If the 'Movies.db` database does not already exist, sqlite3 will create one!
movies_db = sqlite3.connect('Movies.db') # connect to database

Note

If we were to print the connection_variable we would see the following output:

<sqlite3.Connection object at 0x7334db1d3940> # the 0x7334db1d3940 portion will vary

This shows that a sqlite3.Connection object was created and can now be referenced using the movies_db variable.

Cursor Objects

Now that we have established a connection to the database we need a way to execute commands. The cursor object is a database cursor which allows us to do so.

We can create a new cursor object by referencing the cursor function and storing it within a variable:

# variable named "cur" that references the connection object:
cur = movies_db.cursor()

The basic syntax for executing a command with the cursor object is as follows:

cur.execute("SQL statement")

Creating a table

cur.execute("CREATE TABLE table_name (column DATA TYPE, column DATA TYPE, etc..)")

Note

You can find a list of SQLite data types here: Data Types in SQLite .

Insert Table Values

cur.execute("INSERT INTO table_name ('value-one', 'value-two', etc..)")

Reading Data

There are a couple strategies that you can use to read data from your database. Since the cursor object is an iterator in and of itself, you can iterate over the cursor object to fetch data.

Example

# For loop to iterate over cursor object
for row in cur.execute("SELECT column FROM table_name")
    print(row)

The above for loop will return all rows within the specified column inside of the SELECT statement. You could also pass the * flag to return all values from all rows within the database.

You can also use the fetchall() function to read data from the database like so:

cur.execute("SELECT * FROM table_name").fetchall()

Updating Data

When running dynamic queries against a database there are some risks to be aware of, specifically SQL injection attacks or SQLi attacks. While we have multiple strategies to avoid SQLi attacks, the one we will focus on in this class is using parameterized queries.

Parameterized queries allow you to inject a placeholder (?) into your SQL statement and pass in the desired value as a parameter.

Example

# Desired value
update_release_year = 1997 # Value that needs to be updated
movie_to_update = 'Good Will Hunting'
# Execute an UPDATE statement using the ? placeholder, passing in the update variables as a list literal
cur.execute("UPDATE movies SET release = ? WHERE title = ?", [update_release_year, movie_to_update])

Deleting Data

We should also use parameterized queries to safely delete data. Using parameterized queries to update and delete data is a best practice!

Example

movie_to_delete = 'Inception' # Too many sci fi movies!
# Execute a DELETE statement using the ? placeholder, passing in the variable as a list literal
cur.execute("DELETE FROM movies WHERE title = ?", [movie_to_delete])

Check Your Understanding

Question

What type of database is SQLite?

Question

What is the primary reason for creating a cursor object?