Learning What the Heck is Inside SQLite

2 Aug 2024

Starting my batch at the Recurse Center, I knew one of the things I wanted to learn was the basics of database internals. My goal was to answer a basic question: “what does a database look like physically, from the perspective of files on disk?”

B+Trees

I started with “Build Your Own Database From Scratch In Go”. By working through the first half of the book in Rust, I learned:

Following along, I built a memory-mapped B+Tree in Rust to make sure I understood the concepts. A B+Tree consists of internal nodes with variable numbers of children and leaf nodes with data. Each node has a key, and the key type can be ordered between them. Internal nodes store the key ranges of their children, allowing them to determine which children to recurse to when finding a node. There’s also a rebalancing algorithm to split up nodes that get too large and condense nodes that get too small, to ensure that the height of the tree remains small (and thus fast to traverse).

I think my implementation doesn’t fully rebalance itself (making it inefficient), but it does correctly handle get/insert/remove operations. Using mmap means that the file is automatically updated under the hood, which is how some real database systems work. Unfortunately mmap is super unergonomic in Rust, because I’m taking a byte array and doing a bunch of unsafe witchery to map the bytes to data structures. I think my use of MaybeUninit in this code is actually unsound? So I wouldn’t use that B+Tree code as a good example of anything.

SQLite Format

In week 2 I joined some of my batchmates in writing SQLite from scratch. I decided to go about this in Zig, partially to learn a new language and partially to avoid the memory-mapping problems I had in Rust. The first step is to parse the database file’s header, which is pretty simple. Things immediately kicked up a notch on the next step, as I began to dig into SQLite’s internal structures. Each SQLite database is represented by pages (4096 bytes in the test database, but it can be configured on a per-DB basis). Each page consists of a header, an array of internal pointers to “cells”, and then the cells themselves. A page is a node in a B+Tree, either a leaf or branch; these trees represent tables and indices in SQLite.

All this actually sounds pretty simple. Let me assure you that there was plenty of room to write frustrating, hard-to-understand bugs along the way:

After I had worked through these initial obstacles, my program was able to enumerate the tables in a SQLite database file and read the sqlite_schema table to determine their names. For now I’ve stopped there, because I feel like I’ve developed a good understanding of the physical basis of database file storage.

Takeaways

With what I learned from my projects, I can return to my initial question of “what does a database look like physically?” At its simplest, my answer is two core facts:

  1. A key-value store is represented by a tree (probably a B+Tree or an LSM tree) which allows for atomic writes.
  2. Each table in a SQL database can be thought of as a key-value store, with primary keys as keys and rows as values. Each column’s data is contained in the row, or in an overflow section the row points to.

My immediate curiosity is quenched, but the field of database implementation is vast and there are still plenty of things I’d like to learn!

Thanks to Nicholas Walton and Anna Hope for feedback on a draft of this post.