2024-06-10

Weenie lisp

Let's write a super weenie LISP interpreter in nice typed Python.

Our program to interpret looks like:

(add 2
    (multiply 4 5))

In mathspeak: 2 + (4 * 5) which equals 22

First, some imports and definitions:

from __future__ import annotations
from dataclasses import dataclass
import string
from typing import Iterator, Literal
EOF = "\x1a"
Atom = int

Now we define an Iterator with a small difference, we can look at the current value with .peek

@dataclass
class PeekableIterator:
    iter: Iterator[str]
    peek: str = EOF

    def __post_init__(self) -> None:
        next(self)

    def __next__(self) -> None:
        self.peek = next(self.iter, EOF)

Now a tokenizer, this splits our language up into brackets ( or ), names and numbers, ignoring whitespace:

def tokenize(chars: PeekableIterator) -> Iterator[str]:
    while chars.peek != EOF:
        if chars.peek in string.whitespace:
            next(chars)
        elif chars.peek in "()":
            yield chars.peek
            next(chars)
        else:
            current = ""
            while chars.peek in string.ascii_letters + string.digits:
                current += chars.peek
                next(chars)
            yield current

Doing:

chars = PeekableIterator(iter(code))
list(tokenize(chars))

Should give us:

['(', 'add', '2', '(', 'multiply', '4', '5', ')', ')']

Now to the parser, each time we use a token, we consume it with next(tokens), we construct a tree of Nodes:

@dataclass
class Node:
    operator: str
    args: list[Node | Atom]


def parse(tokens: PeekableIterator) -> Node | Atom:
    if tokens.peek == "(":
        next(tokens)
        out = Node(operator=tokens.peek, args=[])
        next(tokens)
        while tokens.peek != ")":
            out.args.append(parse(tokens))  # recursion!
        next(tokens)
        return out
    if tokens.peek.isdigit():
        out = int(tokens.peek)
        next(tokens)
        return out
    raise RuntimeError(f"Unexpected token: '{tokens.peek}'")

Doing:

chars = PeekableIterator(iter(code))
tokens = PeekableIterator(tokenize(chars))
ast = parse(tokens)

Should give us:

Node(
    operator='add',
    args=[
        2,
        Node(
            operator='multiply',
            args=[4, 5],
        ),
    ]
)

Now an interpreter:

def interpret(node: Node | Atom) -> int:
    if isinstance(node, int):
        return node
    if node.operator == "add":
        a, b = node.args
        return interpret(a) + interpret(b)
    if node.operator == "multiply":
        a, b = node.args
        return interpret(a) * interpret(b)

    raise RuntimeError(f"Unknown node {node}")

Finally, running everything:

code = """
(add 2
    (multiply 4 5))
"""
chars = PeekableIterator(iter(code))
tokens = PeekableIterator(tokenize(chars))
ast = parse(tokens)
print(interpret(ast))

Should give us 22.


Fancy some operator precedence?