Since I've been at Oracle Labs, I've learned a lot about databases. As I started to read the literature, I felt odd. I didn't have a grasp of how a database is built, why certain design decisions were made, or how each module fit together. I had an empty map with small islands, but nothing connecting them. So, like a naive grasshopper I set out to build my own database. You can checkout the github here. (The database is just an implementation of the labs in this MIT course - Thanks MIT!).
A relational database is a specialized virtual machine built to execute SQL. The great thing is that due to the high amounts of memory we have, in memory databases are actually possible. Lots of the recent research is in that area. However, most traditional databases still store most of their data on disk, access most data in ram, and are workhorses for many production systems. From what I've seen, most production quality relational databases all stem from System R. System R seems to be the equivalent to Self/Smalltalk for dynamic language VMs.
The high level architecture is like a standard virtual machine. The biggest struggle coming from a compiler background and jumping into the database literature is all terminology. The compiler community and the database community share most of the same components because at the end of the day, the database has to execute the SQL programming language. Let's look at the life cycle of a SQL query:
- Parse SQL
- A Query Plan is created
- Optimize the Query Plan
- Execute the Query Plan
Now let's look at the life cycle of a statement in a programming language (PL):
- Parse the statement
- Generate an Intermediate Representation (IR)
- Optimize the IR
- Execute the IR
Look familiar? This breaks the overall architecture into two modules. First is the front end that parses SQL into a query plan and optimizes it. The second module is the runtime system that executes the query plan and contains all the necessary support to ensure the query plan executes correctly. The interesting thing is that a SQL Query Plan is really relational algebra. And relational algebra is a SQL engine's IR. If you look at relational algebra closely, it looks a lot like a tree... an Abstract Syntax Tree.
Once you use an Abstract Syntax Tree (AST) IR in a programming language, you pretty much have the architecture of a SQL engine. The front end contains a SQL parser which creates a query plan, which is really an AST. Each node is an "operator" such as select, insert, etc. Query plans are executed by interpreting the query plan (e.g. interpreting the AST!). In SQL land, each operator calls next() on it's child operator. next() returns the next tuple (basically a row) and you go all the way up the query plan to get the final result. In PL land, an AST node calls its child node to retrieve a value. The top value of the AST means your program has finished executing. In theory, many of the compiler techniques to optimize programming languages can be applied to SQL!
Once we're interpreting the query plan, it needs to interact with the runtime system. The runtime performs functions like inserting a tuple, or retrieve a tuple from the database. Each SQL query is executed in a transaction. Much like transactional memory, each query either commits to disk or is aborted if something goes wrong. Every transaction can occur while multiple other transactions are also occurring. Conceptually, it's the same thing as multiple threads accessing the same global variable. You need locks to protect data from being written to at the same time, you need some kind of transaction policy to ensure data is consistent, etc. The major components for the runtime system, once we're executing the query plan are:
- Transaction Manager
- Lock Manager
- Buffer Cache
- Log Manager
The transaction manager ensures that each query commits or aborts. It's in charge of getting locks to on data when it needs to read/write data, etc. The Lock manager ensures that transactions have the correct access to the requested data. In reality, at what granularity to lock data is a hard question. A single page at a time, a row at a time, multiple rows, etc. The Buffer Cache coordinates what data is in memory versus in disk. The buffer cache is the equivalent of a virtual page table operating systems have. These three pieces have conceptual equivalents in the virtual machine world.
The one unique piece to databases is the log manager. Since databases cannot go down, there is a huge amount of resources spent on reliability. Unlike VMs, you can't just restart the database once it's crashed and go "woops all data lost". Thus logging is a large portion of what a database does. Once a database crashes, it goes to the log and repairs itself to the point in time where the database crashed. This is the Durability aspect when database literature talks about ACID properties.
From the architecture perspective, we have two separate modules. The front end that parses SQL and generates optimized query plans. A second runtime module that interprets the query plan and performs transaction management. Looks like a nice VM to me :) The biggest difference is that the runtime system seems to be much more interconnected than standard virtual machines with garbage collection / JIT compilers. We'll see how long this feeling lasts.
So far, working on a database feels like the beginning of a second PhD. I'm getting a lot of Aha! moments, learning a ton, and have this n00b feeling. I'll go into more deep dives for each of the major runtime components in future blog posts.