The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.

0403: Sort Orders

Primary Key Order

KV keys are compared as strings, using bytes.Compare(). Relational data has types, and serialized keys may compare in the wrong order.

One fix is to deserialize keys before comparing, then compare by data type. But this makes KV depend on schema, which adds tight coupling. So some databases use another way: design a serialization format that keeps sort order.

// New
func (cell *Cell) EncodeKey(toAppend []byte) []byte
func (cell *Cell) DecodeKey(data []byte) (rest []byte, err error)
// Existing
func (cell *Cell) EncodeVal(toAppend []byte) []byte
func (cell *Cell) DecodeVal(data []byte) (rest []byte, err error)

Rework Row methods to call the new Cell methods:

func (row Row) EncodeKey(schema *Schema) (key []byte)
func (row Row) DecodeKey(schema *Schema, key []byte) (err error)

Order-Preserving Serialization

The following serialization method preserves sort order of int64. See the full book for the explanation.

func (cell *Cell) EncodeKey(out []byte) []byte {
    switch cell.Type {
    case TypeI64:
        return binary.BigEndian.AppendUint64(out, uint64(cell.I64)^(1<<63))
    case TypeStr:
        // TODO
    }
}

Floats can be serialized this way too, but note that negative floats differ from integers (see step 0201).

Null-Terminated Strings

Encode strings as null-terminated strings, and use the following escaping scheme to eliminate null bytes. See the full book for the explanation.

func encodeStrKey(toAppend []byte, input []byte) []byte {
    for _, ch := range input {
        if ch == 0x00 || ch == 0x01 {
            toAppend = append(toAppend, 0x01, ch+1)
        } else {
            toAppend = append(toAppend, ch)
        }
    }
    return append(toAppend, 0x00)
}

Implement the decode function:

func decodeStrKey(data []byte) (out []byte, rest []byte, err error)

Multi-Column Compare

A primary key can have multiple columns. Comparison is like strings, column by column, like tuples in Python. For (a, b) > (c, d), it means a > c || (a == c && b > d). Question: if you serialize columns and join them, does this still work? The answer is yes.

CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.