⚠
The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.
0604: Merge Sorted Data
Merge KV
If SSTables were added, there will be 2 data structures. Keys first go into the MemTable and are later merged into SSTables, so queries may need to merge results from both.
A key may exist in both the MemTable and an SSTable, the MemTable has higher priority. In the LSM-Tree to be implemented later, there can be multiple levels of SSTables. Lower-level SSTables are merged from upper levels. When querying a single key, search from top to bottom, level by level. This is why the first popular LSM-Tree DB was called LevelDB.
Merge Ranges
Single-key queries can check each level in order. Range queries cannot, because the next key may come from any level. To iterate in ascending order, you must pick the smallest key among all level iterators. This is exactly merge sort.
We implement merging of k ranges, which can be reused later for the LSM-Tree. It is not more complex than merging 2 ranges.
type MergedSortedKV []SortedKV
func (m MergedSortedKV) Iter() (iter SortedKVIter, err error) {
levels := make([]SortedKVIter, len(m))
for i, sub := range m {
if levels[i], err = sub.Iter(); err != nil {
return nil, err
}
}
return &MergedSortedKVIter{levels, levelsLowest(levels)}, nil
}
type MergedSortedKVIter struct {
levels []SortedKVIter
which int
}
func (iter *MergedSortedKVIter) Valid() bool {
return iter.which >= 0
}
func (iter *MergedSortedKVIter) Key() []byte {
return iter.levels[iter.which].Key()
}
func (iter *MergedSortedKVIter) Val() []byte {
return iter.levels[iter.which].Val()
}
func (iter *MergedSortedKVIter) Next() error
func (iter *MergedSortedKVIter) Prev() errorwhich records the iterator with the current smallest or largest key. Requirements:
- Upper levels have higher priority than lower levels.
- Do not output duplicate keys.
- Direction can change midway.
ⓘ
CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.