Repeating actions

One thing our language still lacks is the ability to repeat an action multiple times. This is typically necessary to compute things over ranges of numbers (or lists or queues or…). Most languages achieve this using dedicated language constructs called loops.

Loops

One could, for example, wish to sum all numbers in a given range, which you’ll often see implemented using a for loop.

`for` loops

A for loop typically works by iterating over a range of numbers, and:

keeping track of the current iteration with an iterating variable (traditionally called i).
accumulating the result of each iteration in an accumulating variable (traditionally called acc).

There are variations (you could, for example, vary the size of the increment, range over an ordered collection rather than numbers, …), but they’re all syntactic sugar for that basic schema.

Adding all numbers in a given range could be written using the following loop:

let sumFor = lower -> upper ->
    var acc = 0
    for i = lower to upper do
      acc = acc + i
    acc
  in sumFor 1 10

There are two things I’m not a big fan of here, though.

First, for loops do not give us a way of doing anything with the result of each iteration. Unless, that is, our language supports mutability, which I’ve had to use in the above code to modify the accumulator. Mutability is not inherently bad, but it’s also a large source of complexity and I would rather avoid having to support it until there really is no other choice.

Second, for loops are quite limited: you need to know how many times you want to repeat the computation. It would be nicer to work with loops that don’t have this limitation.

`while` loops

That’s exactly what while loops are for. They’re a more generic form of for loops, in that rather than maintaining an incrementing variable for you, they’ll simply evaluate a predicate on each iteration and keep looping until that evaluates to false. for loops are a specific kind of while loop in which the predicate is have I ran the expected number of iterations.

The good news is that, being more generic for loops, you can turn any for loop into a while mechanically. The principle is simple:

maintain your own iterating variable by initialising it to the range’s lower bound and incrementing it by one on each iteration.
break out of the loop as soon as the iterating variable goes over the range’s upper bound.

The previous sumFor function can trivially be transformed to use a while loop by following this process:

let sumWhile = lower -> upper ->
    var acc = 0
    var i   = lower
    while i < upper do
      acc = acc + i
      i   = i + 1
    acc
  in sumWhile 1 10

We could conceivably write code to do that for us on any for loop (we won’t, though, that sounds like a fair amount of work). We could, then:

have support for for loops in the syntax of our language, something that a parser would need to understand.
transform them into while loops when parsing, so that our AST need only support while.

That is, we could treat for loops as syntactic sugar for while loops

But they still have the annoying limitation of requiring mutability. If that’s not clear, think about it this way: the only way of breaking out of a while is for its predicate to suddenly start evaluating to false when it used to evaluate to true. How could you achieve that without changing something?

Luckily, we can also work around that.

Recursion

Here’s another way you can think of sumWhile: for each iteration, i and acc act as input, and acc + i as output.

By following this thought to its logical conclusion, you can rewrite the body of our while loop as a function taking acc and i for input, and then calling itself until the predicate evaluates to false. We don’t have to mutate the values of i and acc any longer, just pass new values to each call.

This is what it would look like:

let sumRec = lower -> upper ->
    (* `acc` and `i` are parameters *)
    let go = acc -> i ->
      (* If the predicate is not falsified, "loop" after updating `acc` and `i`. *)
      if i < upper then go (acc + i) (i + 1)
      (* Otherwise, we're done *)
      else acc

    (* `acc` is initialised to `0`, `i` to the first element in the range. *)
    in go 0 lower
  in sumRec 1 10

This is equivalent to sumWhile, but does not require mutability. It does, however, need for our language to allow functions to call themselves - to support recursive functions.

This transformation can be made as automatic as the one from for to while loops. Which means that if we support recursive functions, we could add for and while to our syntax, but rewrite them to recursive function calls in the AST, essentially having our cake and eating it, too.

Before we move on, however, and now that we have decided to add support for recursive functions, I really must give a more reasonable implementation of sum:

let sum = lower -> upper ->
    if lower > upper then 0
    else lower + (sum (lower + 1) upper)
  in sum 1 10

I find this kind of code lovely. Rather than being very descriptive of the process to follow to find the solution, it looks much more like a description of the problem itself:

if the range is empty, then its sum is 0.
otherwise, it’s the addition of the first element in the range and the sum of the remaining ones.

Do we already support recursion?

You might be tempted to think we already support recursive functions - after all, we already support bindings and functions, which appear to be all that sum needs to work. Well, that, and some way of comparing lower and upper, but that bit really isn’t all that hard to add.

Supporting `Gt`

This is nothing we haven’t done before, so let’s go through it quickly. First, the operational semantics, which are rather self-explanatory:

\begin{prooftree} \AXC{$e \vdash lhs ⇓ \texttt{Value.Num}\ v1$} \AXC{$e \vdash rhs ⇓ \texttt{Value.Num}\ v2$} \BIC{$e \vdash \texttt{Gt}\ lhs\ rhs\ \Downarrow \texttt{Value.Bool} (v1 > v_2)$} \end{prooftree}

This tells us we need to add the following variant to our AST:

case Gt(lhs: Expr, rhs: Expr)

Interpreting of $\texttt{Gt}$ is very similar to how we do $\texttt{Add}$, and can be derived painlessly from the operational semantics:

def gt(lhs: Expr, rhs: Expr, e: Env) =
  (interpret(lhs, e), interpret(rhs, e)) match
    case (Value.Num(v1), Value.Num(v2)) => Value.Bool(v1 > v2)
    case _                              => typeError("gt")

And that’s really all there is to supporting $\texttt{Gt}$. I did say it was going to be simple.

Testing `sum`

Right. We have everything we need to write sum, which will not be fun but not particularly hard either.

val expr = Let(
  name  = "sum",
  value = Fun(
    param = "lower",
    body  = Fun(
      param = "upper",
      body  = Cond(
        pred = Gt(Ref("lower"), Ref("upper")),
        onT  = Num(0),
        onF  = Add(
          lhs = Ref("lower"),
          rhs = Apply(
            fun = Apply(Ref("sum"), Add(Ref("lower"), Num(1))),
            arg = Ref("upper"))
        )
      )
    )
  ),
  body = Apply(Apply(Ref("sum"), Num(1)), Num(10))
)

In theory, then, we should be able to interpret that directly. If you haven’t guessed by now that it was going to blow up in our faces, I suggest you take a break and come back to this article when you’re feeling a little sharper.

interpret(expr, Env.empty)
// java.lang.RuntimeException: Unbound variable: sum

So, no, apparently, we do not have everything we need. At some point, someone tries to access the value referenced by sum, in an environment where no value is bound to it.

Intuition

Let’s think things through a little. Recall the operational semantics of $\texttt{Let}$:

\begin{prooftree} \AXC{$e \vdash value \Downarrow v_1$} \AXC{$e[name \leftarrow v_1] \vdash body \Downarrow v_2$} \BIC{$e \vdash \texttt{Let}\ name\ value\ body \Downarrow v_2$} \end{prooftree}

This makes it clear that $value$ is interpreted in $e$, but that $name$ is only bound in $e[name \leftarrow v_1]$.

Now look at the code of sum:

let sum = lower -> upper ->
    if lower > upper then 0
    else lower + (sum (lower + 1) upper)
  in sum 1 10

$name$, which here is sum, appears in $value$ - which is interpreted in $e$ and not $e[name \leftarrow v_1]$. No wonder it’s not working!

This feels like a complicated problem to solve, doesn’t it? In order to evaluate sum’s body, we need to have evaluated sum’s body. Something will have to give.

`let rec`

What we’ll need, then, is a new term in our language, a new kind of let expression in which $value$ is interpreted in an environment in which it’s bound to $name$. This is traditionally called let rec.

The way around let’s problem is to realize that $value$ here is a function introduction, $Fun$, whose semantics are:

\begin{prooftree} \AXC{$e \vdash \texttt{Fun}\ param\ body\ \Downarrow \texttt{Value.Fun}\ param\ body\ e$} \end{prooftree}

We are storing the environment in which the function is introduced, but not actually using it! This will only happen later, at application time.

In our case, it means the environment will be captured when creating the function, before binding it to the name sum, but not used before application in the body of the let expression. Or, put another way, that we go through the following steps:

Capture the environment in which the function is defined (when producing a function value).
Bind that value to the name sum (to produce the environment in which the body of the let expression will be interpreted).
Use the environment in which sum was defined (when applying the function).

We can see that in step 2, we hold both the value of sum and the environment in which it must be bound, and that the latter won’t be queried until step 3. That’s our window of opportunity! The trick, then, is to add a new step between 2 and 3 in which we update the environment with the necessary binding.

Note that yes, I have in fact just said that we’ll be relying on mutable state. It’s important to realise that this is not quite the same thing as adding mutability to the language: it’s merely an implementation detail of the language’s interpreter.

Avoiding dynamic scoping

There’s a pretty serious flaw in our intuition, however. What do you think the following code should evaluate to?

let x = 1 in
  (let rec x = 2 in 3) + x

Hopefully it’s pretty clear this is a complicated way of saying 4. But if we follow our intuition for the way let rec works, we get:

the let expression creates $e = \{x = 1\}$.
the let rec expression mutates is to $\{x = 2\}$ - we’ll write this $e[x \coloneqq 2]$.
the x reference is interpreted in $e$, in which 2 is bound to it.

An existing binding was changed by a later let expression - dynamic scoping certainly has a way of sneaking up on us when we’re not paying attention, doesn’t it.

This is why I’m so reluctant to support mutability - it’s a very powerful tool, but one that is deceptively easy to get subtly wrong.

What we want to do, then, is to work with a different environment. We do not want to update $e$, but to create a new one we can mutate to our heart’s content without impacting $e$. With that, the previous code would run as follows:

the let expression creates $e_1 = \{x = 1\}$.
the let rec expression creates a new environment $e_2$ with the same bindings as $e_1$.
it then updates it in place: $e_2[x \coloneqq 2]$.
the x reference is interpreted in $e_1$, in which it correctly maps to 1.

And this would lead to the expected result of 4.

This is the critical realisation about recursive bindings: their value must be interpreted in a distinct environment, one which we’re free to mutate without impacting the rest of the program.

Operational semantics

As usual, let’s start with the shape of our problem. $\texttt{LetRec}$ contains exactly the same element as $\texttt{Let}$, it simply has different semantics, so we’ll want something like:

\begin{prooftree} \AXC{$e \vdash \texttt{LetRec}\ name\ value\ body \Downarrow\ ???$} \end{prooftree}

We’re doing eager evaluation, so the first thing we need to do is evaluate $value$ - but as we’ve seen, the entire thing hinges on what environment we do that in. We said we wanted a new environment which contained the same bindings as $e$. Instinctively, we could achieve that by creating a new primitive that clones an existing environment - and that would work! but the literature seems to agree on a different approach, presumably to reduce the number of necessary primitives (if you know of a better reason, I would be very keen to hear it!)

See, we already know how to create a new environment from an existing one: by adding a new binding to it. Of course, we don’t have a value available yet, but we’ve seen that this wasn’t actually a problem: that value will only be needed in $body$, so as long as we make sure to update the binding to the right value before interpreting $body$, we can initialise it to anything. We’ll use $\bullet$ to express that notion of I don’t actually care what value this is.

Since we’ll need to update the newly created environment, we must also give it a name to be able to reference it later. Let’s use $e’$, which gives us:

\begin{prooftree} \AXC{$e’ = e[name \leftarrow \bullet]; e’ \vdash value \Downarrow v_1$} \UIC{$e \vdash \texttt{LetRec}\ name\ value\ body \Downarrow v_2$} \end{prooftree}

You might think that $e$ and $e’$ containing the exact same bindings, we could use one or the other here, but it’s crucially important to use $e’$. It will be captured by the function declared in $value$, which is why mutating $e’$ later will be seen by it.

We’re now ready to interpret $body$, but we must not forget that it must be done in $e’$ after it’s been mutated to bind $v_1$ to $name$. This gives us the final operational semantic:

\begin{prooftree} \AXC{$e’ = e[name \leftarrow \bullet]; e’ \vdash value \Downarrow v_1$} \AXC{$e’[name \coloneqq v_1]; e’ \vdash body \Downarrow v_2$} \BIC{$e \vdash \texttt{LetRec}\ name\ value\ body \Downarrow v_2$} \end{prooftree}

Implementing recursion

Updating the AST

We’ve only really declared a new primitive here: let rec, used to introduce recursive functions. These are, once introduced, regular functions, applied exactly like all others, so we don’t need a special elimination form for them.

As we’ve said, a let rec expression is very similar to a let one, and what few differences exist are only observed during interpretation. We can then basically clone and rename the Let variant:

case LetRec(name: String, value: Expr, body: Expr)

We now have everything we need to represent sum. Exactly what we had before, except we need to replace Let with LetRec:

val expr = LetRec(
  name  = "sum",
  value = Fun(
    param = "lower",
    body  = Fun(
      param = "upper",
      body  = Cond(
        pred = Gt(Ref("lower"), Ref("upper")),
        onT  = Num(0),
        onF  = Add(
          lhs = Ref("lower"),
          rhs = Apply(
            fun = Apply(Ref("sum"), Add(Ref("lower"), Num(1))),
            arg = Ref("upper"))
        )
      )
    )
  ),
  body = Apply(Apply(Ref("sum"), Num(1)), Num(10))
)

“All” we have to do now is interpret it. Famous last words.

Making the environment mutable

In order for $\texttt{LetRec}$ to work, we’ve seen that we needed to be able to update the value of a binding in place - the $e[name \coloneqq value]$ operation.

This is relatively easy to support. First, we must make bindings mutable, which is as simple as adding a well placed var keyword:

case class Binding(name: String, var value: Value)

We then need to provide a way to find and modify an existing binding, which we’ll do with a new update method in Env:

// e[name := value]
def update(name: String, value: Value) =
  env.find(_.name == name)
     .foreach(_.value = value)

Updating the interpreter

Our environment is now fully ready to support $\texttt{LetRec}$.

Recall its semantics:

This is all relatively straightforward to turn into code, except for the $\bullet$ part: we need to decide what value we’ll use to mean any value, I don’t care. And what better value for that than no value at all? Yes, I am in fact talking about using null, and am forced to admit there is some perverse pleasure in using both null and mutability in the same bit of code, and thumbing my nose gleefully at common Scala wisdom.

def letRec(name: String, value: Expr, body: Expr, e: Env) =
  val eʹ = e.bind(name, null)   // eʹ = e[name <- ●]
  val v1 = interpret(value, eʹ) // eʹ |- value ⇓ v₁

  eʹ.update(name, v1)           // eʹ[name := v₁]
  val v2 = interpret(body, eʹ)  // eʹ |- body ⇓ v₂

  v2                            // e |- LetRec name value body ⇓ v₂

All that remains is for us to confirm our implementation is correct by interpreting expr, the program we declared earlier that computes the sum of all numbers between 1 and 10, and observing that it returns the right result:

interpret(expr, Env.empty)
// val res: Value = Num(55)

Should we drop `Let` (again)?

Since let rec seems to offer the same tools as let, but also supports recursive definitions, do we need let at all? Could we just update it to use let rec’s semantics, and use that all the time instead?

On the one hand, yes, we could. On the other, let does have one property that let rec does not. Consider the following code:x

let x = 1 in
  let x = x + 2 in
    x

With normal let semantics, this interprets to 3, which is what your intuition should tell you is right. If let had the same semantics as let rec, however, we could not create a binding by reusing the value referenced to by a previous binding of the same name - the very purpose of let rec is to make these two the same.

Whether or not this kind of shadowing is desirable seems to be more of a matter of opinion than fact. From what I understand, it’s considered a terrible sin by the Scala community, but excellent practice in the Rust one. Since it’s not a clearly undesirable thing, we should probably keep it. Just in case, you know.

Where to go from here?

Our language now has all the basic building blocks one needs to write interesting programs - the theory goes that every interesting program can now be written using Expr. It might require a lot of work, and we might want to enrich our language to provide more facilities out of the box, but it’s possible.

One thing we might want to look into, before moving on to more high-level features, is something I find slightly distasteful: it’s perfectly possible to write programs that make no sense, and not find out about it until we interpret them. We can, for example, write code that adds a number to a boolean.

We will start looking into techniques to confirm that a program makes sense before interpreting it in the next part of this series.