Code Map: DuckDB’s Full Scan Query

In this article, I’d like to share the diagram which I took as a note to get an idea of DuckDB’s query execution flow.

The diagram is the code map of a query execution in DuckDB version 1.0.0, focusing on the simplest scenario of full scan.

Please view in your favorite image viewer. It’s a light (< 300KB) SVG image, but big in size:

Notes:

  • If there’s no arrow coming out of a function, it’s often calling the function in the same file. You most likely can proceed to read downwards in the same file.

  • Code excerpts are partial most of the time. Sometimes it has ... to indicate there are skipped lines, but it’s not there most of the time.

  • Sometimes function signatures miss return types and/or function arguments. Apologies in advance.

  • Different flows are color-coded as noted on left-top of the diagram.

    • There are flows of query, write, initialization, compression type and sink.

    • The focus of this diagram is query execution, so query flow is roughly complete in the diagram, but all other flows are very partial.

Overview

It (hopefully) covers an entire flow after statement parsing from:

Scan Function

Scan function is loaded at table-binding time. It goes through the storage abstractions as below:

Notes:

  • There are multiple implementations of buffer manager, but StandardBufferManager is traced in this code map.

Compression Algorithm

Compression algorithm is decided upon write time and stored together with data. Upon data read, it’s deserialized and loaded into ColumnSegment.

Details:

Notes:

  • There are multiple types of compression algorithms, but UncompressedString and DictionaryCompression are traced in this code map.