I wrote a database in 45 commits and turned them into a book
Inspired by build-your-own-X projects, in which one codes a condensed version of something from scratch, I took on the challenge of coding a small, relational, and ACID-compliant database in Go.
It’s a great way to test my learnings and the project also sounds really cool. So I want to share the results.
The hard parts
What is even a database?
Database is a buzzword. Nearly everything that stores data is marketed as a “database”, including caching servers, file formats, some middleware, etc. The 1st challenge is to identify what is traditionally considered a database.
So I looked into some big names, such as MySQL and SQLite to understand what they really offer. What they have in common are:
- some durability guarantee, resistance to crashes and power loss;
- a transactional interface, offering atomicity, transaction isolation and concurrency control;
- tabular data;
- a query language;
- indexes.
The first 2 points are loosely coined as ACID (atomicity, consistency, isolation, durability). The rest points are about relational databases and SQL.
There is also a divide between analytical (OLAP) and transactional (OLTP) databases. I must choose one because they are fundamentally different. I chose OLTP, the same type as these big-name databases.
Keep it small
The 2nd challenge is to keep it small. I need to do it in 3K lines of code instead of 300K like the real thing, or I will never finish.
There are many tutorials on components of databases, such as log storage, KV, parsers and interpreters. Each component could be a whole book’s worth of content.
Instead of gluing existing components, I coded a condensed version of everything so that they fits within 3K lines, which is the spirit of build-your-own-X.
Make it incremental
The 3rd challenge is knowing where to start? Do I just code each component individually and fit them together in the end? That would be boring as the database won’t be ready until the last step. Instead, I took a test-driven approach. I broke the project down into many small, testable steps so that my code would be runnable from the step 1.
Database in 45 small steps
You can see the 45 steps here. They are grouped into 9 chapters, each focuses on one major feature and is divided into about 5 small, test-driven steps.
It starts from an in-memory map, then:
- add a log for durability,
- encode tabular data as KV,
- add a small subset of SQL,
- support range queries with sorted data,
- add more SQL for evaluating expressions,
- add SSTables as the main storage,
- implement LSM-Tree,
- add secondary indexes,
- add a transaction interface for atomicity,
- add snapshot isolation,
- add concurrency control,
- support multi-threading.
Database as 45 coding puzzles
Inspired by the Advent of Code project, in which one solves small coding puzzles. I released the test cases and supporting code for each step, turning them into a series of small, test-driven coding puzzles. I call this project Trial of Code.
Instead of pure puzzles, these steps lead to a hardcore project resembling an advanced, complex software. This is valuable because so much software development involves dealing with complexities, a skill not learned from tutorial-ish code or simple CRUD jobs.
Also, there are many of advanced software projects, but few resources on how to approach them. This motivated me to start this project of actually advanced coding tutorials, hoping to bridge the gap.
The book
The project is designed as a series of puzzles, in which you write a little code, pass the tests, and move to the next step. But not everyone has the time or commitment for the full steps. So, I wrote an accompanying book with detailed explanations and background info. You can buy the book to:
- aid your journey.
- dive deeper into related subjects.
- learn database internals without committing to the full project.
- support the author.
