OverviewTeaching: 20 min
Exercises: 25 minQuestions
Use the sqlite3 module to interact with a SQL database.
Access data stored in SQLite using Python.
Describe the difference in interacting with data stored as a CSV file versus in SQLite.
Describe the benefits of accessing data using a database compared to a CSV file.
When you open a CSV in python, and assign it to a variable name, you are using your computers memory to save that variable. Accessing data from a database like SQL is not only more efficient, but also it allows you to subset and import only the parts of the data that you need.
In the following lesson, we’ll see some approaches that can be taken to do so.
The sqlite3 module provides a straightforward interface for interacting with
SQLite databases. A connection object is created using
connection must be closed at the end of the session with the
While the connection is open, any interactions with the database require you to
make a cursor object with the
.cursor() command. The cursor is then ready to
perform all kinds of operations with
import sqlite3 # Create a SQL connection to our SQLite database con = sqlite3.connect("data/portal_mammals.sqlite") cur = con.cursor() # the result of a "cursor.execute" can be iterated over by row for row in cur.execute('SELECT * FROM species;'): print(row) #Be sure to close the connection. con.close()
One of the most common ways to interact with a database is by querying: retrieving data based on some search parameters. Use a SELECT statement string. The query is returned as a single tuple or a tuple of tuples. Add a WHERE statement to filter your results based on some parameter.
import sqlite3 # Create a SQL connection to our SQLite database con = sqlite3.connect("data/portal_mammals.sqlite") cur = con.cursor() # Return all results of query cur.execute('SELECT plot_id FROM plots WHERE plot_type="Control"') cur.fetchall() # Return first result of query cur.execute('SELECT species FROM species WHERE taxa="Bird"') cur.fetchone() #Be sure to close the connection. con.close()
Using pandas, we can import results of a SQLite query into a dataframe. Note that you can use the same SQL commands / syntax that we used in the SQLite lesson. An example of using pandas together with sqlite is below:
import pandas as pd import sqlite3 # Read sqlite query results into a pandas DataFrame con = sqlite3.connect("data/portal_mammals.sqlite") df = pd.read_sql_query("SELECT * from surveys", con) # verify that result of SQL query is stored in the dataframe print(df.head()) con.close()
Storing your data in an SQLite database can provide substantial performance improvements when reading/writing compared to CSV. The difference in performance becomes more noticable as the size of the dataset grows (see for example these benchmarks).
Challenge - SQL
Create a query that contains survey data collected between 1998 - 2001 for observations of sex “male” or “female” that includes observation’s genus and species and plot type for the sample. How many records are returned?
Create a dataframe that contains the total number of observations (count) made for all years, and sum of observation weights for each plot, ordered by plot ID.