Databases with Python
Working within a database command line interface can oftentimes make it cumbersome and difficult to execute multiple commands. Because of this, analysts usually prefer to use a user-friendly graphical user interface (GUI) or leverage programming languages like Python and supported libraries to interact with them. We will use Python and the sqlite3 library to complete the following:
- Create a new SQLite database
- Add a table
- Perform CRUD operations on the table
While you can accomplish more than just the above using python and pandas, like performing joins, it is not always best practice. As it relates to joins, database engines are built and optimized to perform joins extremely well. It is always important to know what you will be doing with your data before acting.
We have created a new repo for Class 19 and 20 exercises and studios. Please fork this repo to your Github account, and then clone it to your local device
Class 19 and 20 Exercise Studio Repo
The examples below can be found at data-analysis-projects-class-19-and-20/class-20/python-db-walkthrough.ipynb
sqlite3 with Python
sqlite3 works in conjunction with python by allowing the user to establish a connection to a file located on your machine. You can then reference the connection variable to begin executing sql commands.
The basic syntax is as follows:
import sqlite3
# If the 'Movies.db` database does not already exist, sqlite3 will create one!
movies_db = sqlite3.connect('Movies.db') # connect to database
If we were to print the connection_variable
we would see the following output:
<sqlite3.Connection object at 0x7334db1d3940> # the 0x7334db1d3940 portion will vary
This shows that a sqlite3.Connection
object was created and can now be referenced using the movies_db
variable.
Cursor Objects
Now that we have established a connection to the database we need a way to execute commands. The cursor object is a database cursor which allows us to do so.
We can create a new cursor object by referencing the cursor function and storing it within a variable:
# variable named "cur" that references the connection object:
cur = movies_db.cursor()
The basic syntax for executing a command with the cursor object is as follows:
cur.execute("SQL statement")
Creating a table
cur.execute("CREATE TABLE table_name (column DATA TYPE, column DATA TYPE, etc..)")
You can find a list of SQLite data types here: Data Types in SQLite .
Insert Table Values
cur.execute("INSERT INTO table_name ('value-one', 'value-two', etc..)")
Reading Data
There are a couple strategies that you can use to read data from your database. Since the cursor object is an iterator in and of itself, you can iterate over the cursor object to fetch data.
# For loop to iterate over cursor object
for row in cur.execute("SELECT column FROM table_name")
print(row)
The above for loop will return all rows within the specified column inside of the SELECT
statement. You could also pass the *
flag to return all values from all rows within the database.
You can also use the fetchall()
function to read data from the database like so:
cur.execute("SELECT * FROM table_name").fetchall()
Updating Data
When running dynamic queries against a database there are some risks to be aware of, specifically SQL injection attacks or SQLi attacks. While we have multiple strategies to avoid SQLi attacks, the one we will focus on in this class is using parameterized queries.
Parameterized queries allow you to inject a placeholder (?
) into your SQL statement and pass in the desired value as a parameter.
# Desired value
update_release_year = 1997 # Value that needs to be updated
movie_to_update = 'Good Will Hunting'
# Execute an UPDATE statement using the ? placeholder, passing in the update variables as a list literal
cur.execute("UPDATE movies SET release = ? WHERE title = ?", [update_release_year, movie_to_update])
Deleting Data
We should also use parameterized queries to safely delete data. Using parameterized queries to update and delete data is a best practice!
movie_to_delete = 'Inception' # Too many sci fi movies!
# Execute a DELETE statement using the ? placeholder, passing in the variable as a list literal
cur.execute("DELETE FROM movies WHERE title = ?", [movie_to_delete])
Check Your Understanding
What type of database is SQLite?
What is the primary reason for creating a cursor object?