The web version only has simple instructions since chapter 04, while the full book has detailed explanations and background info.

0301: Tokenizer

string → tokens → struct

An SQL string must be parsed into program data before it can be processed. For example:

select a,b from t where c=1;

Will be represented as:

StmtSelect{
    table: "t",
    cols:  []string{"a", "b"},
    keys:  []NamedCell{{column: "c", value: Cell{Type: TypeI64, I64: 1}}},
}

SQL is similar to English, with its own words and grammar. In programming languages, words are called tokens. Before grammar parsing, the string is split into tokens. This step is called tokenizing or lexing.

SQL tokens can be grouped as:

Each type has different rules and is coded in different functions.

Syntax Parser

Most parsing works by consuming tokens from left to right and building data structures. So we need to track the current position in the string.

type Parser struct {
    buf string
    pos int
}
func NewParser(s string) Parser {
    return Parser{buf: s, pos: 0}
}

Parse Names (table names, column names)

func (p *Parser) tryName() (string, bool)

Requirements:

For example, input Parser {buf: " hi ", pos: 0}. After tryName(), pos = 3, return "hi".

Use these helpers:

func isSpace(ch byte) bool {
    switch ch {
    case '\t', '\n', '\v', '\f', '\r', ' ':
        return true
    }
    return false
}
func isAlpha(ch byte) bool {
    return 'a' <= (ch|32) && (ch|32) <= 'z'
}
func isDigit(ch byte) bool {
    return '0' <= ch && ch <= '9'
}
func isNameStart(ch byte) bool {
    return isAlpha(ch) || ch == '_'
}
func isNameContinue(ch byte) bool {
    return isAlpha(ch) || isDigit(ch) || ch == '_'
}

Parse Keywords

func (p *Parser) tryKeyword(kw string) bool

Requirements:

Use this to detect separators:

func isSeparator(ch byte) bool {
    return ch < 128 && !isNameContinue(ch)
}

CodeCrafters.io has similar courses in many programming languages, including build your own Redis, SQLite, Docker, etc. It’s worth checking out.