2024-06-10
Let's write a super weenie LISP interpreter in nice typed Python.
Our program to interpret looks like:
(add 2
(multiply 4 5))
In mathspeak: 2 + (4 * 5)
which equals 22
First, some imports and definitions:
from __future__ import annotations
from dataclasses import dataclass
import string
from typing import Iterator, Literal
EOF = "\x1a"
Atom = int
Now we define an Iterator
with a small difference, we can look at the current value with .peek
@dataclass
class PeekableIterator:
iter: Iterator[str]
peek: str = EOF
def __post_init__(self) -> None:
next(self)
def __next__(self) -> None:
self.peek = next(self.iter, EOF)
Now a tokenizer, this splits our language up into brackets (
or )
, names and numbers, ignoring whitespace:
def tokenize(chars: PeekableIterator) -> Iterator[str]:
while chars.peek != EOF:
if chars.peek in string.whitespace:
next(chars)
elif chars.peek in "()":
yield chars.peek
next(chars)
else:
current = ""
while chars.peek in string.ascii_letters + string.digits:
current += chars.peek
next(chars)
yield current
Doing:
chars = PeekableIterator(iter(code))
list(tokenize(chars))
Should give us:
['(', 'add', '2', '(', 'multiply', '4', '5', ')', ')']
Now to the parser, each time we use a token, we consume it with next(tokens)
, we construct a tree of Node
s:
@dataclass
class Node:
operator: str
args: list[Node | Atom]
def parse(tokens: PeekableIterator) -> Node | Atom:
if tokens.peek == "(":
next(tokens)
out = Node(operator=tokens.peek, args=[])
next(tokens)
while tokens.peek != ")":
out.args.append(parse(tokens)) # recursion!
next(tokens)
return out
if tokens.peek.isdigit():
out = int(tokens.peek)
next(tokens)
return out
raise RuntimeError(f"Unexpected token: '{tokens.peek}'")
Doing:
chars = PeekableIterator(iter(code))
tokens = PeekableIterator(tokenize(chars))
ast = parse(tokens)
Should give us:
Node(
operator='add',
args=[
2,
Node(
operator='multiply',
args=[4, 5],
),
]
)
Now an interpreter:
def interpret(node: Node | Atom) -> int:
if isinstance(node, int):
return node
if node.operator == "add":
a, b = node.args
return interpret(a) + interpret(b)
if node.operator == "multiply":
a, b = node.args
return interpret(a) * interpret(b)
raise RuntimeError(f"Unknown node {node}")
Finally, running everything:
code = """
(add 2
(multiply 4 5))
"""
chars = PeekableIterator(iter(code))
tokens = PeekableIterator(tokenize(chars))
ast = parse(tokens)
print(interpret(ast))
Should give us 22
.
Fancy some operator precedence?