Paper: Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems

This articles is notes taken from the paper: Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems published in 2015.

Series: Papers adopted by DuckDB

DuckDB can handle analytical query workloads incredibly fast. This series is my notes from the publications adopted by DuckDB (listed here).

  • Vectorized Query Engine
  • Fast Serializable MVCC: this article
  • Join Ordering Optimization (coming soon)
  • Unnesting Subqueries (coming soon)

What is it about?

This paper proposes an implementation of multiversion concurrency controll (MVCC) which has less overhead and less locking while providing serializability. Most of MVCC implementations out there provide snapshot isolation (SI).

Overall, the approach is to provide the serializable isolation level by checking possible conflicts by the commit times while using snapshots for a transaction to data reads.

Database Isolation Levels: Snapshot vs Serializable

I compared snapshot isolation level and serializable isolation level here.

Also it seems DuckDB has implemented its MVCC based on the concept of this paper, however without the serializable transaction controls. I haven’t confirmed 100%, but it seems DuckDB provides snapshot isolation level currently, which makes sense to me as its focus is more on OLAP instead of OLTP.

Storage Locations of Versions

The paper does not directly state it, however a good example of scattered versions is PostgreSQL in my opinion.

PostgreSQL is designed to store versions in its permanent storage for multiple benefits such as simplicity and durability. However this decision brings VACUUM business and a very expensive cost to retrieve accurate counts of records in a large table. Separating the version storage from perpament storage seems to be a good design of providing snapshots of records without complexities.

Reduced Locking

Selective Checks

Efficiant Validation and Conflict Resolution

Benchmarks