!pip install fastlite -qUThis is an introduction to using fastlite, covering the basics of creating and manipulating sqlite databases. Fastlite builds on top of sqlite-utils, adding some quality of life improvements.
The latest version of fastlite doesn’t import directly from sqlite-utils anymore. It now imports from a forked version of sqlite-utils called apswutils which uses the APSW sqlite module rather than the core Python sqlite3 module. See here for the main differences between the APSW and sqlite3 modules.
Installation
To use fastlite you can pip install it and import it as follows.
import pandas as pd
from fastlite import *Create the Database
Let’s start by creating the database using the sqlite-utils Database() function. Typically you’d specify a filename which would store the sqlite database locally in a persistent file.
But you also have the option to create it purely in memory, which is what we’ll do here since it’s just a notebook.
# Create an in-memory database for testing
db = database(":memory:")
# Create a basic users table with essential fields
users = db.t.users # This creates a reference but doesn't create the table yet
users.create(
id=int,
username=str,
email=str,
created_at=str,
pk='id' # Set id as primary key
)<Table users (id, username, email, created_at)>
This creates a new database in memory with a single users table. We can see the structure of the database and the table fields/types using db.schema.
print(db.schema)
# we can also do users.schema for a specific tableCREATE TABLE [users] (
[id] INTEGER PRIMARY KEY,
[username] TEXT,
[email] TEXT,
[created_at] TEXT
);
Insert Table Rows
It’s easy enough to start inserting new records using the insert_all() function which is ideal when you want to bulk insert items.
You can alternatively use insert() when you want to insert a single record.
e.g.
users.insert({"username": "dexy", "email": "dex@example.com", "created_at": "2024-01-05"})# Insert multiple users at once
new_users = [
{"username": "carol", "email": "carol@example.com", "created_at": "2024-01-03"},
{"username": "dave", "email": "dave@example.com", "created_at": "2024-01-04"}
]
users.insert_all(new_users)<Table users (id, username, email, created_at)>
Displaying Table Data
To access all records in a table we can just call the talbe name like a function users(). Or we can target specific records by the primary key (id in this case), or by direct SQL query.
# Get all users (this works as shown before)
print("All users:")
print(users())
# Get a specific user by primary key (this works as shown before)
print("\nUser with id 1:")
print(users[1])
# Query with conditions (corrected version)
print("\nUsers created after Jan 2:")
print(db.q("SELECT * FROM users WHERE created_at > '2024-01-02'"))All users:
[{'id': 1, 'username': 'carol', 'email': 'carol@example.com', 'created_at': '2024-01-03'}, {'id': 2, 'username': 'dave', 'email': 'dave@example.com', 'created_at': '2024-01-04'}]
User with id 1:
{'id': 1, 'username': 'carol', 'email': 'carol@example.com', 'created_at': '2024-01-03'}
Users created after Jan 2:
[{'id': 1, 'username': 'carol', 'email': 'carol@example.com', 'created_at': '2024-01-03'}, {'id': 2, 'username': 'dave', 'email': 'dave@example.com', 'created_at': '2024-01-04'}]
We can display table data in a notebook in a few different ways.
# Convert query results to DataFrame
df = pd.DataFrame(users())
display(df) # In Jupyter, this creates a nice table view| id | username | created_at | ||
|---|---|---|---|---|
| 0 | 1 | carol | carol@example.com | 2024-01-03 |
| 1 | 2 | dave | dave@example.com | 2024-01-04 |
# This will show all columns nicely aligned
db.q("SELECT * FROM users ORDER BY created_at")[{'id': 1,
'username': 'carol',
'email': 'carol@example.com',
'created_at': '2024-01-03'},
{'id': 2,
'username': 'dave',
'email': 'dave@example.com',
'created_at': '2024-01-04'}]
def show_table(query):
df = pd.DataFrame(db.q(query))
return df.style.set_properties(**{
'background-color': '#f5f5f5',
'border': '1px solid #888'
}).hide(axis='index')
# Use it like this:
show_table("SELECT * FROM users")| id | username | created_at | |
|---|---|---|---|
| 1 | carol | carol@example.com | 2024-01-03 |
| 2 | dave | dave@example.com | 2024-01-04 |
This custom table display is quite minimal and aesthetic so we will use this from now on to display table records.
Modifying an Existing Table Schema
In production you might well come across situations where you need to modify a table schema which has existing data. To update a table we can call the table create() once again but this time specifying the transform field.
This actually initiates an SQL transaction which copies the table, updates it, and then drops the original table and renames the copied and updated table. You can read more about this in the sqlite-utils docs.
# Let's create our initial basic table
users = db.t.users
users.create(
id=int,
username=str,
email=str,
created_at=str,
pk='id'
)
print(users.schema)
# Now let's add constraints by transforming the existing table
users.create(
id=int,
username=str,
email=str,
created_at=str,
pk='id',
not_null={'username', 'email'},
transform=True # This is key for modifying existing table
)
print(users.schema)CREATE TABLE [users] (
[id] INTEGER PRIMARY KEY,
[username] TEXT,
[email] TEXT,
[created_at] TEXT
)
CREATE TABLE "users" (
[id] INTEGER PRIMARY KEY,
[username] TEXT NOT NULL,
[email] TEXT NOT NULL,
[created_at] TEXT
)
Add a One-To-One Relational Table
Let’s add a new profiles table that is has a one-to-one connection to the existing users table via a foreign key, which will point to the id field of the users table.
# Create the user_profiles table with a foreign key to users
profiles = db.t.user_profiles
profiles.create(
user_id=int,
full_name=str,
bio=str,
location=str,
pk='user_id',
foreign_keys=[('user_id', 'users', 'id')] # Links to users.id
)
# Let's add some test data
users.insert({"username": "alice", "email": "alice@example.com", "created_at": "2024-01-01"})
users.insert({"username": "bob", "email": "bob@example.com", "created_at": "2024-01-02"})
# Then add profiles for both users
profiles.insert({
"user_id": 1, # Alice's ID
"full_name": "Alice Smith",
"bio": "Python developer",
"location": "San Francisco"
})
profiles.insert({
"user_id": 2, # Bob's ID
"full_name": "Bob Jones",
"bio": "Database expert",
"location": "New York"
}){'user_id': 2,
'full_name': 'Bob Jones',
'bio': 'Database expert',
'location': 'New York'}
print(db.schema)CREATE TABLE "users" (
[id] INTEGER PRIMARY KEY,
[username] TEXT NOT NULL,
[email] TEXT NOT NULL,
[created_at] TEXT
);
CREATE TABLE [user_profiles] (
[user_id] INTEGER PRIMARY KEY REFERENCES [users]([id]) ON UPDATE CASCADE ON DELETE CASCADE,
[full_name] TEXT,
[bio] TEXT,
[location] TEXT
);
show_table("SELECT * FROM users")| id | username | created_at | |
|---|---|---|---|
| 1 | alice | alice@example.com | 2024-01-01 |
| 2 | bob | bob@example.com | 2024-01-02 |
show_table("SELECT * FROM user_profiles")| user_id | full_name | bio | location |
|---|---|---|---|
| 1 | Alice Smith | Python developer | San Francisco |
| 2 | Bob Jones | Database expert | New York |
We now have a users table and a user_profile table connected together in a one-to-one relationship via the user id.
Table Joins
We can create an SQL query that combines data from both users and user_profiles tables. The JOIN links the tables using the user_id relationship.
# Using fastlite's query method to join tables
join_query = """
SELECT u.username, u.email, p.full_name, p.location, p.bio
FROM users u
JOIN user_profiles p ON u.id = p.user_id
"""
print("All users with their profiles:")
show_table(join_query)All users with their profiles:
| username | full_name | location | bio | |
|---|---|---|---|---|
| alice | alice@example.com | Alice Smith | San Francisco | Python developer |
| bob | bob@example.com | Bob Jones | New York | Database expert |
# We can also create a view for frequently used joins
db.create_view("user_details", join_query)
print("\nSame data using view:")
show_table("SELECT * FROM user_details")
Same data using view:
| username | full_name | location | bio | |
|---|---|---|---|---|
| alice | alice@example.com | Alice Smith | San Francisco | Python developer |
| bob | bob@example.com | Bob Jones | New York | Database expert |
We created a SQL query that combines data from both users and user_profiles tables. The JOIN links the tables using the user_id relationship.
We also created a view named user_details which saves this join query for easy reuse. Views are like virtual tables that make complex queries simpler to work with.
Foreign Key Constraint
We set up user_id in profiles to reference id in users. New profiles can only be added for an existing user id otherwise an error will be thrown.
# Try to add a profile for non-existent user (should fail)
try:
profiles.insert({
"user_id": 999, # This ID doesn't exist in users table
"full_name": "Invalid User",
"bio": "Should not work",
"location": "Nowhere"
})
except Exception as e:
print(f"\nExpected foreign key error: {e}")
Expected foreign key error: FOREIGN KEY constraint failed
The failed insert attempt (user_id: 999) demonstrates how foreign keys protect data integrity. This prevents “orphaned” profiles (profiles without valid users) whch is crucial feature for maintaining data consistency.
Add Indexes
Indexes work like a book’s index - they speed up lookups.
# Add indexes for commonly searched fields
users.create_index(['username']) # Index for username lookups
users.create_index(['email']) # Index for email lookups
profiles.create_index(['location']) # Index for location-based searches
# Show all indexes in our database
print("\nDatabase indexes:")
for table in db.tables:
print(f"\nIndexes for {table.name}:")
print(table.indexes)
# Example of how indexes help with queries
location_query = """
SELECT u.username, p.location
FROM users u
JOIN user_profiles p ON u.id = p.user_id
WHERE p.location = 'San Francisco'
"""
print("\nQuerying by location (should use index):")
show_table(location_query)
Database indexes:
Indexes for users:
[Index(seq=0, name='idx_users_email', unique=0, origin='c', partial=0, columns=['email']), Index(seq=1, name='idx_users_username', unique=0, origin='c', partial=0, columns=['username'])]
Indexes for user_profiles:
[Index(seq=0, name='idx_user_profiles_location', unique=0, origin='c', partial=0, columns=['location'])]
Querying by location (should use index):
| username | location |
|---|---|
| alice | San Francisco |
We added indexes on frequently searched fields (username, email, location). The location index helps optimize queries filtering by location. Without indexes, SQLite would need to scan entire tables
Analyzing Query Performance
Let’s take a look at query performance with and without using indexes.
# Enable query explanation
explain_query = """
EXPLAIN QUERY PLAN
SELECT u.username, p.location
FROM users u
JOIN user_profiles p ON u.id = p.user_id
WHERE p.location = 'San Francisco'
"""
# Drop index using SQL
db.execute("DROP INDEX IF EXISTS idx_user_profiles_location")
db.execute("VACUUM;") # This forces a write and cleanup
# Check query plan without index
print("Query plan WITHOUT index:")
show_table(explain_query)Query plan WITHOUT index:
| id | parent | notused | detail |
|---|---|---|---|
| 3 | 0 | 216 | SCAN p |
| 7 | 0 | 45 | SEARCH u USING INTEGER PRIMARY KEY (rowid=?) |
# Recreate the index
profiles.create_index(['location'])
# Check query plan WITH index
print("\nQuery plan WITH index:")
show_table(explain_query)
Query plan WITH index:
| id | parent | notused | detail |
|---|---|---|---|
| 3 | 0 | 54 | SEARCH p USING COVERING INDEX idx_user_profiles_location (location=?) |
| 7 | 0 | 45 | SEARCH u USING INTEGER PRIMARY KEY (rowid=?) |
Without using an index SQLite must scan the entire profiles table to find matching locations. The notused value of 216 indicates higher computational cost. This is inefficient, especially with larger datasets.
But with an inde SQLite uses the index we created on location. This time, the ’notusedvalue of54shows significantly lower computational cost.COVERING INDEX` means the index contains all needed data, avoiding table lookups.
In both cases: - SEARCH u USING INTEGER PRIMARY KEY: This part is efficient because primary keys are automatically indexed - The rowid=? shows it’s doing direct lookups in the users table
This highlights why indexes are crucial for performance. Without an index a full table scan is required, but with an index a direct lookup using a sorted index can be used.
Add a Many-To-Many Relational Table
This is a very common type of table relationship. Let’s add a user roles table to demonstrate this.
# Create roles table
roles = db.t.roles
roles.create(
id=int,
name=str,
description=str,
pk='id',
not_null={'name'}
)
# Create user_roles junction table for many-to-many relationship
user_roles = db.t.user_roles
user_roles.create(
user_id=int,
role_id=int,
assigned_at=str,
pk=('user_id', 'role_id'), # Composite primary key
foreign_keys=[
('user_id', 'users', 'id'),
('role_id', 'roles', 'id')
]
)
# Let's add some sample data
roles.insert_all([
{"name": "admin", "description": "Full system access"},
{"name": "editor", "description": "Can edit content"},
{"name": "user", "description": "Basic access"}
])
# Assign roles to users
user_roles.insert_all([
{"user_id": 1, "role_id": 1, "assigned_at": "2024-01-01"}, # Alice is admin
{"user_id": 1, "role_id": 2, "assigned_at": "2024-01-01"}, # Alice is also editor
{"user_id": 2, "role_id": 2, "assigned_at": "2024-01-02"} # Bob is editor only
])<Table user_roles (user_id, role_id, assigned_at)>
Let’s take a closer look at the code we just implemented.
Roles Table - This is just a simple table with id, name, and description. - The name field must be unique as we can’t have duplicate role names. - name is also required (not_null) and can’t be empty.
User_Roles Junction Table - This creates the many-to-many relationship between the existing users table and the new roles table. - It uses a composite primary key (user_id, role_id) - each user-role combination must be unique. - Two foreign keys ensure referential integrity: * user_id must exist in users table * role_id must exist in roles table - Includes assigned_at to track when roles were granted.
Sample Data - Created three example roles: ‘admin’, ‘editor’, ‘user’. - Assigned roles to users: * Alice (user_id: 1) has both admin and editor roles * Bob (user_id: 2) has editor role only
Now, let’s run some test queries on the new tables.
# 1. Get all users with their roles
user_roles_query = """
SELECT u.username, r.name as role_name, ur.assigned_at
FROM users u
JOIN user_roles ur ON u.id = ur.user_id
JOIN roles r ON r.id = ur.role_id
ORDER BY u.username, r.name
"""
print("Users and their roles:")
print(db.q(user_roles_query))
# 2. Find all users with a specific role
admin_query = """
SELECT u.username
FROM users u
JOIN user_roles ur ON u.id = ur.user_id
JOIN roles r ON r.id = ur.role_id
WHERE r.name = 'admin'
"""
print("\nAdmin users:")
print(db.q(admin_query))
# 3. Count roles per user
role_count_query = """
SELECT u.username, COUNT(r.id) as role_count
FROM users u
LEFT JOIN user_roles ur ON u.id = ur.user_id
LEFT JOIN roles r ON r.id = ur.role_id
GROUP BY u.username
"""
print("\nRole count per user:")
print(db.q(role_count_query))Users and their roles:
[{'username': 'alice', 'role_name': 'admin', 'assigned_at': '2024-01-01'}, {'username': 'alice', 'role_name': 'editor', 'assigned_at': '2024-01-01'}, {'username': 'bob', 'role_name': 'editor', 'assigned_at': '2024-01-02'}]
Admin users:
[{'username': 'alice'}]
Role count per user:
[{'username': 'alice', 'role_count': 2}, {'username': 'bob', 'role_count': 1}]
Let me break down these query results:
- Users and their roles:
- Shows all user-role assignments with timestamps
- Alice has two rows because she has two roles:
- admin (assigned Jan 1)
- editor (assigned Jan 1)
- Bob has one row:
- editor only (assigned Jan 2)
- The results are ordered by username and role_name (due to
ORDER BY)
- Admin users:
- Only shows users with the ‘admin’ role
- Only Alice appears because she’s the only admin
- This type of query is useful for permission checking
- Note how the
WHERE r.name = 'admin'filters the results
- Role count per user:
- Shows how many roles each user has
- Uses
COUNT()andGROUP BYto aggregate the data - Results show:
- Alice: 2 roles (admin + editor)
- Bob: 1 role (editor only)
- Note the
LEFT JOINensures we’d see users even if they had no roles
Visualise Graph-Based Databse Schema
A nice feature of Fastlite is that it comes with a diagram() function that outputs an sqlite database schema as a graph-based diagram. This requires graphviz to be installed on your system.
diagram(db.tables)