The Buffer Cache

The buffer cache is virtual memory for databases. It loads pages from disk into main memory and keeps them in memory for fast access. When the cache is full, pages in memory are flushed to disk and new pages are loaded into memory. It acts like any other cache. That means all the caching algorithms like how to remove items from cache, etc are all valid here.

All requests for data come through the buffer cache. The buffer cache has an API called getPage(PageId pid). Each tuple stored in the database is located on a page somewhere in disk and has metadata about which page it's stored on. For example, when a query needs to "select * from table", every page on disk is actually loaded through the buffer cache. When getPage is called, it first needs to check if the page is in the cache. If the page is not in the cache, the buffer cache reads the page from disk and stores it in memory. Let's see the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class BufferCache {
  HashMap<PageId, Page> _cache;

  public Page getPage(PageId pid) {
    if (!inCache(pid)) {
      Page page = readFromDisk(pid);
      _cache.put(pid, page);
    }

    return _cache.get(pid);
  }

  private bool inCache(PageId pid) {
    return _cache.containsKey(pid);
  }
}

This is a really really simple BufferCache. In reality, it's not even a cache because it never evicts anything from the cache, but functionally it's doing all a buffer cache is supposed to do. All requests for data from disk come through BufferCache.getPage with a page identifier. If it's in the cache, return the page. Otherwise, we fetch the page from disk and store the fetched page in the cache. 

Evicting a Page

How about when the cache is full, how do we know when and which page to evict? Setting the size of the cache is probably a dark art filled with hours tuning to find heuristics in commercial databases. Which page to evict is an easier story. When a page in memory is modified, it is considered "dirty". Dirty means that the page in memory is different from the page on disk due to a write. Generally, we never evict a dirty page from the cache because then we have to flush the page back to disk. This is complicated because pages are modified in transactions, which we'll get into later. 

We know that we can only evict pages that aren't dirty, which is still a sizable list. This is where caching policies come into effect. What real commercial databases do is probably also another dark art flooded with heuristics gathered over the years. A simple caching eviction policy is least recently used (LRU). To implement this, we just have to mark that a page was used. A really simple LRU implementation just uses a list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class BufferCache {
  private ArrayList<PageId> _lru;

  public Page getPage(PageId pid) {
    if (isCacheFull()) { 
      evictPage();
    }

    if (!inCache(pid)) { ... }

    updateUse(pid);
    return _cache.get(pid);
  }

  private void updateUse(PageId pid) {
    // Yes this is incredibly slow, 
    // can use a hashmap -> list index
    // to make faster, but simple implementation
    int index = _lru.indexOf(pid);
    int front = 0;
    Page page = _lru.get(index);
    _lru.remove(index);
    _lru.add(front, page);
  }

  private void evictPage() {
    int last = _lru.size() - 1;
    Page page = _lru.get(last);

    // Need a separate structure to detect
    // dirty pages, simplified for now
    assert (!page.isDirty());
    PageId pid = page.getId();
    _cache.remove(pid);
    _lru.remove(last);
  } 
}

Now we have a working BufferCache. It evicts pages when the cache is full. When a page is in memory, we don't have to go to disk to find the data. Assuming data locality, this really speeds up accessing data. However, our implementation is still too small. First, we totally ignore transactions. 

Working with Transactions

Everything in a database is wrapped in a Transaction. A transaction is a sequence of SQL statements that happen atomically. When a transaction starts, it can start requesting pages from the buffer pool. However, multiple transactions can occur at the same time. In order to have parallelism, you need locks. At what granularity you lock items is another heuristic supported by years of dark art performance tuning. Some databases lock at the page level, some at the row level, table level, etc. Assuming our implementation uses page level locks, we need to ensure that only a single transaction has a write lock on a single page. Multiple transactions can have multiple read locks on a page. Thus we need something like a LockManager that guarantees only one transaction has a write lock on a given page. Thus we have to modify our getPage a little to also grab the lock for a given page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class BufferCache {
  public Page getPage(PageId pid, 
    Transaction transaction, Permissions permission) 
    throws TransactionAbortedException {
    if (isCacheFull()) {...}
    if (!inCache(pid)) {...}

    try {
      _lockManager.grabLocks(pid, permission);
    } (Exception e) {
      throw new 
      TransactionAbortedException("Could not grab locks");
    }

    return _cache.get(pid);
  }
}

Locking is pretty complicated, so I'll go into it in another blog post. Just note that for now, the buffer cache has to call the lock manager to ensure that only one transaction has an exclusive write lock on a page. Multiple transactions can have a read lock.

That's the essentials of a buffer cache. We ignored logging for now, which changes the buffer cache a little bit. We'll get into that in another post, but the conceptual requirements of a buffer cache are all here. You can checkout one implementation of a buffer cache on github here.

* Thanks to Ivan Ostres, you can combine HashMap and an LRU cache with Java's LinkedHashMap.