Happy 2024! In this series of posts I'm going to be running through the implementation of a lightweight SQLite wrapper, from scratch, in Swift.
Our Pool
is now fully-functional, however there's one piece of functionality missing: how do we "set up" the schema of our database, and how do we evolve it over time? Ultimately data never stands still, and we need a way to safely update the structure of our data as needs change.
Our approach will revolve around the concept of a migration. This is a self-contained mutation to the structure of our data, and could involve creating tables, adjusting existing tables, or inserting data. We could represent one with a Migration
struct, with the .action
containing the SQL to run.
public struct Migration: Sendable {
public init(version: Int, action: @escaping Action) {
self.version = version
self.action = action
}
public typealias Action = @Sendable (isolated Connection) throws -> Void
public var version: Int
public var action: Action
}
Each Migration
has a version. These are used to keep track of each migration, and allows our database to ensure that each migration is only run once (for example, we wouldn't want to try and create a table twice!). Below is a demonstration of how multiple migrations will interact with the database over time.
Since we keep track of migrations by their version, it's critical that we never change the .action
of a Migration
for a specific version
once it's shipped. Otherwise, it wouldn't necessarily run for existing users. The key to avoiding schema corruption issues is to only ever append new Migration
s when required, and to ensure that Migration
s can always be applied sequentially.
The above is a bit hypothetical - how do we apply these to our Pool
? We need to adjust our initializer to take a collection of Migration
s (so the client can supply their own), and we need to apply any pending migrations before we start using any Connection
s.
We can do this by modifying our initializeWriteConnectionTask
, which is always allowed to run to completion before any other executions on the database.
public final class Pool {
public init(url: URL, maxReaders: Int, migrations: [Migration]) {
initializeWriteConnectionTask = Task {
let writeConnection = try await Connection(url: url)
try await Self.initializeDatabase(connection: writeConnection, migrations: migrations)
return writeConnection
}
readConnections = AsyncPool(maxElements: maxReaders) {
try await Connection(url: url)
}
}
...
private static func initializeDatabase(
connection: Connection,
migrations: [Migration]
) async throws {
try await connection.execute("PRAGMA journal_mode = WAL")
try await connection.transaction { connection in
// Determine the current migration version to determine which are pending.
let currentVersion = try connection
.execute("PRAGMA user_version")
.first?["user_version"]?
.get(Int.self) ?? 0
let pendingMigrations = migrations.filter { $0.version > currentVersion }
// Apply any pending migrations.
if !pendingMigrations.isEmpty {
for migration in pendingMigrations {
try migration.action(connection)
}
try connection.execute("PRAGMA user_version = \(migrations.last?.version ?? 0)")
}
}
}
}
In initializeDatabase
, we set the journal_mode
PRAGMA (as before), and we apply any pending migrations. We use the user_version
PRAGMA (a field that SQLite lets us use as we like) to keep track of the current migration version, and we execute any Migration
s with a version higher than this value.
We execute all of our migrations within a single transaction, which ensures that all of our migrations apply, or none of them do. This ensures that we don't end up in a "half-applied migrations" state if one of our migrations were to fail, for any reason.
Let's write a quick test to verify our migration application behavior.
func testMigrations() async throws {
// Given:
let databaseURL = temporaryDatabaseURL()
var migrations = [
Migration(version: 1) { try $0.execute("CREATE TABLE test_1 (id INTEGER NOT NULL)") },
Migration(version: 2) { try $0.execute("CREATE TABLE test_2 (id INTEGER NOT NULL)") },
]
// Then:
try await {
let pool = Pool(url: databaseURL, maxReaders: 1, migrations: migrations)
try await pool.read { try $0.execute("SELECT * from test_1") }
try await pool.read { try $0.execute("SELECT * from test_2") }
}()
// Verify that opening another pool succeeds, without attempting to recreate the tables.
try await {
let pool = Pool(url: databaseURL, maxReaders: 1, migrations: migrations)
try await pool.read { try $0.execute("SELECT * from test_1") }
try await pool.read { try $0.execute("SELECT * from test_2") }
}()
// When:
// Verify that adding a further migration will run it on next pool init.
migrations.append(Migration(version: 3) {
try $0.execute("CREATE TABLE test_3 (id INTEGER NOT NULL)")
})
// Then:
try await {
let pool = Pool(url: databaseURL, maxReaders: 1, migrations: migrations)
try await pool.read { try $0.execute("SELECT * from test_1") }
try await pool.read { try $0.execute("SELECT * from test_2") }
try await pool.read { try $0.execute("SELECT * from test_3") }
}()
}
Here, we first open a Pool
using a collection of Migration
s, before opening another for the same URL and verifying that the migrations aren't double-applied (if they were, they would fail). Appending a new Migration
(with version 3) and opening a later Pool
verifies that new migrations are applied additively as required.
Schema initialization and migrations can be tricky, but are crucial to get right. The nightmare scenario of releasing an update that causes an on-launch crash for existing users due to a data migration you didn't fully anticipate is something I try and avoid by keeping migration logic as simple as possible. This model of small, additive migrations fully defined in code has served me well in the past.
And with that, our wrapper is complete ๐ Across the four parts of the series we've moved from opening a single connection, inserting and retrieving values from the database, pooling connections to open up the realm of concurrency, and now finally supporting incremental migrations to keep our schema in check. The code up to this point can be found on Github.
And this is just a starting point. There's a bunch of additional functionality you could add from here, such as strongly-typed Row
s using macros, more ORM-esque features to move away from writing raw SQL directly, or performance monitoring using OSLog and signposts.
I hope you've enjoyed the series! If you have any feedback, feel free to reach out on Mastodon.