Happy 2024! In this series of posts I'm going to be running through the implementation of a lightweight SQLite wrapper, from scratch, in Swift.

Our Pool is now fully-functional, however there's one piece of functionality missing: how do we "set up" the schema of our database, and how do we evolve it over time? Ultimately data never stands still, and we need a way to safely update the structure of our data as needs change.

Migration

Our approach will revolve around the concept of a migration. This is a self-contained mutation to the structure of our data, and could involve creating tables, adjusting existing tables, or inserting data. We could represent one with a Migration struct, with the .action containing the SQL to run.

public struct Migration: Sendable {
  public init(version: Int, action: @escaping Action) {
    self.version = version
    self.action = action
  }

  public typealias Action = @Sendable (isolated Connection) throws -> Void

  public var version: Int
  public var action: Action
}

Each Migration has a version. These are used to keep track of each migration, and allows our database to ensure that each migration is only run once (for example, we wouldn't want to try and create a table twice!). Below is a demonstration of how multiple migrations will interact with the database over time.

User installs v1.0 of our app. Current max migration version is 4.
- Apply migration 1
- Apply migration 2
- Apply migration 3
- Apply migration 4

User continues to use v1.0. No new migrations, so nothing to do on each launch.

User installs v2.0 of our app. There are new migrations, and max migration version is now 6.
- Apply migration 5
- Apply migration 6

Since we keep track of migrations by their version, it's critical that we never change the .action of a Migration for a specific version once it's shipped. Otherwise, it wouldn't necessarily run for existing users. The key to avoiding schema corruption issues is to only ever append new Migrations when required, and to ensure that Migrations can always be applied sequentially.

Applying Migrations

The above is a bit hypothetical - how do we apply these to our Pool? We need to adjust our initializer to take a collection of Migrations (so the client can supply their own), and we need to apply any pending migrations before we start using any Connections.

We can do this by modifying our initializeWriteConnectionTask, which is always allowed to run to completion before any other executions on the database.

public final class Pool {
  public init(url: URL, maxReaders: Int, migrations: [Migration]) {
    initializeWriteConnectionTask = Task {
      let writeConnection = try await Connection(url: url)
      try await Self.initializeDatabase(connection: writeConnection, migrations: migrations)
      return writeConnection
    }
    readConnections = AsyncPool(maxElements: maxReaders) {
      try await Connection(url: url)
    }
  }

  ...

  private static func initializeDatabase(
    connection: Connection,
    migrations: [Migration]
  ) async throws {
    try await connection.execute("PRAGMA journal_mode = WAL")

    try await connection.transaction { connection in
      // Determine the current migration version to determine which are pending.
      let currentVersion = try connection
        .execute("PRAGMA user_version")
        .first?["user_version"]?
        .get(Int.self) ?? 0
      let pendingMigrations = migrations.filter { $0.version > currentVersion }

      // Apply any pending migrations.
      if !pendingMigrations.isEmpty {
        for migration in pendingMigrations {
          try migration.action(connection)
        }
        try connection.execute("PRAGMA user_version = \(migrations.last?.version ?? 0)")
      }
    }
  }
}

In initializeDatabase, we set the journal_mode PRAGMA (as before), and we apply any pending migrations. We use the user_version PRAGMA (a field that SQLite lets us use as we like) to keep track of the current migration version, and we execute any Migrations with a version higher than this value.

We execute all of our migrations within a single transaction, which ensures that all of our migrations apply, or none of them do. This ensures that we don't end up in a "half-applied migrations" state if one of our migrations were to fail, for any reason.

Tests

Let's write a quick test to verify our migration application behavior.

   func testMigrations() async throws {
    // Given:
    let databaseURL = temporaryDatabaseURL()
    var migrations = [
      Migration(version: 1) { try $0.execute("CREATE TABLE test_1 (id INTEGER NOT NULL)") },
      Migration(version: 2) { try $0.execute("CREATE TABLE test_2 (id INTEGER NOT NULL)") },
    ]

    // Then:
    try await {
      let pool = Pool(url: databaseURL, maxReaders: 1, migrations: migrations)
      try await pool.read { try $0.execute("SELECT * from test_1") }
      try await pool.read { try $0.execute("SELECT * from test_2") }
    }()

    // Verify that opening another pool succeeds, without attempting to recreate the tables.
    try await {
      let pool = Pool(url: databaseURL, maxReaders: 1, migrations: migrations)
      try await pool.read { try $0.execute("SELECT * from test_1") }
      try await pool.read { try $0.execute("SELECT * from test_2") }
    }()

    // When:
    // Verify that adding a further migration will run it on next pool init.
    migrations.append(Migration(version: 3) {
      try $0.execute("CREATE TABLE test_3 (id INTEGER NOT NULL)")
    })

    // Then:
    try await {
      let pool = Pool(url: databaseURL, maxReaders: 1, migrations: migrations)
      try await pool.read { try $0.execute("SELECT * from test_1") }
      try await pool.read { try $0.execute("SELECT * from test_2") }
      try await pool.read { try $0.execute("SELECT * from test_3") }
    }()
  }

Here, we first open a Pool using a collection of Migrations, before opening another for the same URL and verifying that the migrations aren't double-applied (if they were, they would fail). Appending a new Migration (with version 3) and opening a later Pool verifies that new migrations are applied additively as required.

Conclusion

Schema initialization and migrations can be tricky, but are crucial to get right. The nightmare scenario of releasing an update that causes an on-launch crash for existing users due to a data migration you didn't fully anticipate is something I try and avoid by keeping migration logic as simple as possible. This model of small, additive migrations fully defined in code has served me well in the past.

And with that, our wrapper is complete 🎉 Across the four parts of the series we've moved from opening a single connection, inserting and retrieving values from the database, pooling connections to open up the realm of concurrency, and now finally supporting incremental migrations to keep our schema in check. The code up to this point can be found on Github.

And this is just a starting point. There's a bunch of additional functionality you could add from here, such as strongly-typed Rows using macros, more ORM-esque features to move away from writing raw SQL directly, or performance monitoring using OSLog and signposts.

I hope you've enjoyed the series! If you have any feedback, feel free to reach out on Mastodon.