Transactions in databases are a series of SQL statements that must occur together. When the transaction commits, all the SQL statements must succeed. When a transaction aborts, all of the SQL statements effects must be erased or rolled back. Implementing a transaction manager requires intricate knowledge of the system logger, the buffer cache, and a lock manager.
Conceptually, a transaction management system does three things. Let's assume we lock at the page level granularity rather than at the row / tuple granularity. First, when a transaction begins and requests a page of data from disk, it must grab a lock on the page. Multiple transactions can have a read lock on a single page while only one transaction can have an exclusive write lock on a page. If a transaction cannot grab a lock, it must block until the page is available. This can lead to deadlock, so many sophisticated transaction management systems also contain deadlock detection features. Second, the transaction management system must track all pages and locks for the current running transaction. Finally, when a transaction commits or aborts, the transaction manager must release all the locks associated with the finished transaction. A transaction management system that first grabs all locks as needed, then releases all locks when the transaction is complete/aborted is called Strong Strict two-phase locking.
We'll break down the transaction manager's interactions with three other subsystems as the interactions between the different subsystems are extremely entangled and reliant on each other. First, the buffer cache must be modified such that every page fetch must be attached with a transaction. The transaction manager checks if the current transaction has a lock on the current page. If so, the buffer cache can return the requested page. Otherwise the transaction manager must get the lock for the page before the buffer cache can return the page.
The second sub-system is how the transaction manager interacts with a lock manager. Architecturally, the decision on whether or not to have a separate lock manager is a design decision. Personally, I like having the lock logic extracted in a separate entity. When a transaction requests a page from the buffer cache, it needs to track if the current transaction already has a lock on the requested page. We can model which transaction owns which pages by using a bipartite graph. We have all transactions on one side and all pages on another side. A single edge between a transaction and a page indicates a write lock. Multiple edges to a page represent multiple read locks. Here's a sample implementation.
How do we implement the lock manager? A single page may have multiple read locks if multiple transactions request read access to a single page. However, a transaction that writes a page must have an exclusive write lock on the page. Thus, when a transaction requests a page, it also requests a specific lock for the page. The lock manager looks through the bipartite graph to check whether or not the lock request can be satisfied. If so, the lock manager adds an edge in the bipartite graph representing the lock. Since multiple different threads representing different transactions can occur at the same time, the lock manager itself must be synchronized to ensure that granting locks is an atomic operation.
What do we do if a transaction cannot grab a lock? Deadlock can occur and so most transaction management systems have deadlock detection and resolution features. The simplest is to just abort the transaction and try again.
Once a transaction is committed, it must release all of its locks and write the modified tuples to disk. The buffer cache must flush all written pages to disk. The locking manager removes all edges in the bipartite graph and the transaction is complete.
What happens if a transaction is aborted? It needs to be able to rollback or undo all page writes. We need to add the feature to every page such that when a page is modified in memory, it can be reverted to its previous state as was seen on disk. If you check out the implementation of HeapPage, it has a byte field called oldData. This represents the data as was read off disk. When a transaction modifies a page with a write, it only occurs in memory, specifically in the buffer cache. When a transaction commits, the oldData is updated to reflect the current data in memory and the changes are flushed to disk. When a transaction aborts, the data in memory is reverted to the oldData and viola, a transaction abort occurred for this page. When a transaction aborts, all the pages written to during the transaction must be reverted.
Finally, the last piece is the transaction manager's interaction with the logger. The logger keeps track of which transactions modified which pages and whether or not the transaction committed. When a page is flushed to disk, it's previous data and most current data are stored to disk along with the transaction that performed the update. In addition, the logger logs when a transaction begins, aborts, or commits. Thus the one change we need to make is that we need to add a log entry at every transaction begin, abort, or commit. When a page of data is flushed to disk, we have to log the changed page data. This is enough data such that the database can recover when the database crashes. The most important issue is that the logger use write ahead logging, which logs transaction actions prior to performing the actual action.
That's it for the transaction manager. Next up, logging!