The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.

0604: Merge Sorted Data

Merge KV

If SSTables were added, there will be 2 data structures. Keys first go into the MemTable and are later merged into SSTables, so queries may need to merge results from both.

A key may exist in both the MemTable and an SSTable, the MemTable has higher priority. In the LSM-Tree to be implemented later, there can be multiple levels of SSTables. Lower-level SSTables are merged from upper levels. When querying a single key, search from top to bottom, level by level. This is why the first popular LSM-Tree DB was called LevelDB.

Merge Ranges

Single-key queries can check each level in order. Range queries cannot, because the next key may come from any level. To iterate in ascending order, you must pick the smallest key among all level iterators. This is exactly merge sort.

We implement merging of k ranges, which can be reused later for the LSM-Tree. It is not more complex than merging 2 ranges.

type MergedSortedKV []SortedKV

func (m MergedSortedKV) Iter() (iter SortedKVIter, err error) {
    levels := make([]SortedKVIter, len(m))
    for i, sub := range m {
        if levels[i], err = sub.Iter(); err != nil {
            return nil, err
        }
    }
    return &MergedSortedKVIter{levels, levelsLowest(levels)}, nil
}

type MergedSortedKVIter struct {
    levels []SortedKVIter
    which  int
}
func (iter *MergedSortedKVIter) Valid() bool {
    return iter.which >= 0
}
func (iter *MergedSortedKVIter) Key() []byte {
    return iter.levels[iter.which].Key()
}
func (iter *MergedSortedKVIter) Val() []byte {
    return iter.levels[iter.which].Val()
}
func (iter *MergedSortedKVIter) Next() error
func (iter *MergedSortedKVIter) Prev() error

which records the iterator with the current smallest or largest key. Requirements:

CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.