Building RocksDB-like storage: Merging Iterator and Range Deletions

Building RocksDB-like storage: Merging Iterator and Range Deletions

January 27, 2026
DBMS, LSM, RocksDB, cckv, C++
The implementation details for the Merging Iterator in cckv.

This is the second post in a series documenting the development of cckv, a RocksDB-like KV storage engine written in C++20 for my experimental database sboxdb.

In an LSM-based storage engine, a key can be found in multiple sources:

  1. The Active MemTable (where new writes go).
  2. Immutable MemTables (older data waiting to be flushed).
  3. Leveled On-disk SST files (TODO).

Each of these sources is sorted, but their key ranges overlap. When a user asks to scan the database (e.g., “Give me everything from key ‘A’ to ‘Z’”), we need to present all these separate sources as one single, sorted stream of data.

This is the job of the Merging Iterator.

The Basic Idea: Merging Sorted Lists #

Imagine we have several decks of cards, and each deck is already sorted. We want to merge them into one big sorted pile.

The classic and efficient way to do this is to look at only the top card of each deck (a.k.a. Merge Sort).

  1. Compare the top cards.
  2. Pick the smallest one.
  3. Move that card to your output pile.
  4. Repeat.

This is exactly what the Merging Iterator does. It uses a Min-Heap (a data structure that’s good at finding the smallest item) to keep track of the current key from every MemTable (and later, SST file).

The Tricky Part: “Delete Range” #

Things get complicated when we support Range Deletions (e.g., “Delete everything from ‘A’ to ‘C’”).

A simplistic approach would be:

  1. Pick the next key from the heap (e.g., “Apple”).
  2. Check a list of all range deletions to see if “Apple” is deleted.
  3. If yes, throw it away. If no, give it to the user.

The problem is: if we have a massive range deletion covering millions of keys, we would still pick up every single deleted key, check it, and throw it away. This is very slow.

The RocksDB Solution (Used in cckv) #

cckv adopts the modern approach used by RocksDB (redesigned in 2022, inspired by CockroachDB’s Pebble). Instead of treating deletions as a separate check, we put the boundaries of the deletions right into the same heap as the data keys.

Our heap contains three things:

  1. Real Data Keys (e.g., “Apple”).
  2. Start of Deletion (e.g., “Start deleting at ‘B’”).
  3. End of Deletion (e.g., “Stop deleting at ‘F’”).

How It Works #

We maintain a simple set of “Active Deletions” (active_).

  1. Start Deleting: When the heap gives us a “Start deleting at ‘B’” marker, we add that deletion to our active_ set.
  2. Stop Deleting: When the heap gives us a “Stop deleting at ‘F’” marker, we remove it from the active_ set.
  3. Point Keys: When the heap gives us a real key (e.g., “Cat”), we just check: Is the active_ set empty?
    • If Empty: The key is visible. Return it.
    • If Not Empty: The key is currently covered by a deletion. Skip it!

The “Fast Forward” Optimization #

Here is the real magic. If we see that a key is covered by an active deletion, we don’t just go to the next key. We ask the iterator to Skip straight to the End of that deletion.

If a deletion covers “B” through “F”, and we encounter “Cat” (which is inside), we immediately jump our search to “F”. We skip “Dog”, “Elephant”, and everything else in between instantly.

This turns an O(N) operation (checking every key) into an O(log N) operation (one jump).

Current Status and Future Steps #

Right now, cckv only implements the MemTable (in-memory) components. It haven’t built the Flush or Compaction processes yet, so there are no SST files on disk. However, this Merging Iterator is already designed to handle them. When the SST files added later, the same logic will apply to them as well (perhaps with some small tweaks).

Check the notes in the repository for more specific implementation invariants and technical history of this design in RocksDB.