Let’s start with a very simple programming language, one that only supports addition of numbers. The following would be a valid expression in that language:
1 + 2
We can easily see that there are two distinct things our type for code must support: an expression in our language can be either a number, or the addition of two numbers.
This kind of “one thing or another” type is almost always best encoded as a sum type, which we could naively write as:
enum Expr:
case Num(value: Int)
case Add(lhs: Num, rhs: Num)
Num
wraps a number, and Add
has two numeric operands (called lhs
and rhs
for left-hand side and right-hand side). This feels reasonable.
There’s a problem with that representation which we’ll highlight and address soon, but before that, let’s think about how to interpret such expressions. The way we’ll do this is by writing formal specifications for how we’d like things to work, and then turn these into actual code. Such specifications are called operational semantics, and they’re hugely useful in thinking about our language.
Numbers simply evaluate to themselves. In order to express that, we’ll need a symbol for the notion of interprets to, and the traditional one is $\Downarrow$.
We can then formally express what a Num
is interpreted as:
\begin{prooftree} \AXC{$\texttt{Num}\ value \Downarrow value$} \end{prooftree}
This is a fairly straightforward rule that we can easily turn into code:
def interpret(expr: Expr): Int = expr match
case Num(value) => value // Num value ⇓ value
Instinctively, we’d probably want to write the operational semantics of addition as follows:
\begin{prooftree} \AXC{$\texttt{Add}\ lhs\ rhs \Downarrow lhs + rhs$} \end{prooftree}
This can’t really work, however: $lhs$ and $rhs$ are not actual numbers, but terms of our language. Before we can add them, we need to turn them into numbers, which we do by interpreting them.
We’ll write this as follows:
\begin{prooftree} \AXC{$lhs \Downarrow v_1$} \AXC{$rhs \Downarrow v_2$} \BIC{$\texttt{Add}\ lhs\ rhs \Downarrow v_1 + v_2$} \end{prooftree}
Where the horizontal line separates the preconditions (often called antecedents) from the conclusion (often called consequent): if the things described above the line hold, then so does what’s under the line. Our rule, then, expresses:
These semantics can be turned into code rather directly:
def add(lhs: Num, rhs: Num) =
val v1 = interpret(lhs) // lhs ⇓ v₁
val v2 = interpret(rhs) // rhs ⇓ v₂
v1 + v2 // Add lhs rhs ⇓ v₁ + v₂
Which gives us a full interpreter:
def interpret(expr: Expr): Int = expr match
case Num(value) => value
case Add(lhs, rhs) => add(lhs, rhs)
We can then confirm that it works as expected:
// 1 + 2
val expr = Add(Num(1), Num(2))
interpret(expr)
// val res: Int = 3
This all appears to work quite well, and is in fact such a common pattern that things like Expr
have a name: Abstract Syntax Tree, or AST for short.
That name is composed of two parts: abstract syntax, and syntax tree.
It’s called abstract syntax because it’s an abstraction over the human facing syntax. The following, syntactically different expressions, all parse to the same Expr
:
1 + 2
(1 + 2)
1 + 2
(((1) + 2))
Expr
unifies all that to Add(Num(1), Num(2))
, abstracting over the gory details of whitespace and disambiguation parentheses. Expr
is an abstract syntax.
Expr
can be seen as a tree, which is made quite clear when viewing it as a diagram:
The operator, Add
, is the root of the tree, and its operands are leaves.
Now, you might be thinking that it’s not much of a tree: it’s got a depth of 1, and cannot grow. It’s more of a bonsai than a tree, really. Which is, in fact, the symptom of an underlying issue in our encoding.
You might have noticed that our AST is very limited - I’ve certainly been quite heavy handed at hinting it. How, for example, would we represent the following expression:
1 + 2 + 3
And the answer is, we can’t, not with Expr
such as it is. The trick is to realise that the operands of Add
are not necessarily numbers: we want to be able to nest expressions, so that Add
can take another Add
as an operand, and allow us to add more than 2 numbers.
Here’s how we’ll fix Expr
:
enum Expr:
case Num(value: Int)
case Add(lhs: Expr, rhs: Expr)
Notice that both of Add
’s operands are now Expr
rather than Num
: we can now use either Num
or Add
as operands to Add
, which is exactly what we wanted.
The interesting bit is that this doesn’t cause us to change our semantics at all:
\begin{prooftree} \AXC{$lhs \Downarrow v_1$} \AXC{$rhs \Downarrow v_2$} \BIC{$\texttt{Add}\ lhs\ rhs \Downarrow v_1 + v_2$} \end{prooftree}
Whether lhs
and rhs
are Num
or Expr
doesn’t matter, they’re still things that can be interpreted, which is all we care about. We do have to change the types of operands in add
’s implementation, however:
def add(lhs: Expr, rhs: Expr) =
val v1 = interpret(lhs) // lhs ⇓ v₁
val v2 = interpret(rhs) // rhs ⇓ v₂
v1 + v2 // Add lhs rhs ⇓ v₁ + v₂
We can easily confirm that this all works:
// 1 + (2 + 3)
val expr = Add(Num(1), Add(Num(2), Num(3)))
interpret(expr)
// val res: Int = 6
We can now represent simple arithmetic operations in memory and interpret them. This is obviously not yet a full fledged programming language, but it’s a nice start!
In the live coding sessions of this series, I will usually also implement multiplication, subtraction and division, as well as a basic code formatter. They’re useful for practice, but maybe not so much for explanations, so I’ll skip this here.
You however should feel free to try your hand at it if you have the time. But do not attempt yet to write operations that work with anything but numbers - this requires a little subtlety, and we’ll tackle that in our next session.