One flaw with our current type checking is that while it allows us to confirm an expression is well-typed, it does not prevent us from creating an ill-typed one. It’s still perfectly possible to create an Expr
that adds a number to a boolean, we “merely” have a validation function that tells us we should not.
We can go one step further and have a new AST that is its own proof of type-correctness: an AST that makes it impossible to create ill-typed expressions. The general idea is very similar to type checking: an evaluator which, given an Expr
, yield not a type, but a typed expression. A typed expression should look very much like an Expr
, with a key difference: it would include typing rules in its very definition.
That is a bit of a pithy statement, so let’s take a concrete example. We know $\texttt{Add}$ can only work with numeric operands, but this is not at all enforced. Add(Num(1), Bool(false))
is a valid value representing an invalid program.
We need to account for this nonsensical case in type checking, interpretation, and pretty much any evaluator we may want to write. What we’d like instead, is for it to be impossible to create an $\texttt{Add}$ whose operands aren’t numeric so that no one ever has to check for that again - it’s guaranteed by the type system.
A quick note: something that might get a little confusing is type system. Am I referring to the one we’re implementing, or to the host language’s, Scala? In order to try and disambiguate at least a little, I’ll be referring to our type system (the one we’re implementing) and the type system (that of the language we’re writing all our code in).
So: we want the type system to make it impossible for us to create an $\texttt{Add}$ whose operands aren’t numeric. This means the type system must somehow be aware of the kind of Expr
it’s working with. How could we encode that information in types?
The typical solution is to write a version of Expr
that has a type parameter, and use it to express the type of an expression. For example, $\texttt{Add}$ would take something like Expr[Num]
and be something like an Expr[Num]
: it takes numbers and yields a number.
Our typed expression, then, would look something like:
enum TypedExpr[A]
Of course, we must then ask ourselves - what will go in the type parameter?
A popular encoding (one that I’ve actually always used until it occurred to me that there was an alternative) is to use the host language’s primitive types. Int
for a numeric expression, say. This is quite comfortable, because then an expression is indexed by the type it will be interpreted as: a numeric expression is interpreted as an Int
, a boolean one as a Boolean
…
I now prefer indexing TypedExpr
with Type
, as I find it more elegant and a more direct match to our typing rules. It also has less of an “interpreter bias” - you could be writing a compiler to a language that isn’t Scala, say, in which case Scala’s Int
would feel very artificial.
We’ll be indexing our typed expressions with Type
then, which forces us to update TypedExpr
:
enum TypedExpr[A <: Type]
I usually try to avoid adding subtyping constraints when I can avoid it - they’re a little syntax heavy and unpleasant - but here, I feel we probably should. We want TypedExpr
to be as tight as possible, and unpleasant though it may be, the fact that A
must be a subtype of Type
is an important constraint.
We’ll spend the rest of this article adapting every Expr
variant into its type checked TypedExpr
one.
Here’s how we wrote literal numbers in Expr
:
case Num(value: Int)
We also defined their typing rule:
\begin{prooftree} \AXC{$\Gamma \vdash \texttt{Num}\ value : \texttt{Type.Num}$} \end{prooftree}
This is telling us $\texttt{Num}$ is always of type $\texttt{Type.Num}$. And, really, this is all we need: the typed version of Num
must simply declare itself to be of type Type.Num
:
case Num(value: Int)
extends TypedExpr[Type.Num] // Γ |- Num value : Type.Num
Literal booleans are declared as:
case Bool(value: Boolean)
We also defined their typing rule:
\begin{prooftree} \AXC{$\Gamma \vdash \texttt{Bool}\ value : \texttt{Type.Bool}$} \end{prooftree}
We can apply the exact same principle as for Num
to get the typed version of Bool
:
case Bool(value: Boolean)
extends TypedExpr[Type.Bool] // Γ |- Bool value : Type.Bool
Add
As before, we need to look at the untyped version of Add
:
case Add(
lhs: Expr,
rhs: Expr
)
As well as its typing rule:
\begin{prooftree} \AXC{$\Gamma \vdash lhs : \texttt{Type.Num}$} \AXC{$\Gamma \vdash rhs : \texttt{Type.Num}$} \BIC{$\Gamma \vdash \texttt{Add}\ lhs\ rhs : \texttt{Type.Num}$} \end{prooftree}
This rule tells us both operands and $\texttt{Add}$ itself must all be of type $\texttt{Type.Num}$. We can merely take this information and move it into the typed version of Add
:
case Add(
lhs: TypedExpr[Type.Num], // Γ |- lhs : Type.Num
rhs: TypedExpr[Type.Num] // Γ |- rhs : Type.Num
) extends TypedExpr[Type.Num] // Γ |- Add lhs rhs : Type.Num
It’s important to realise how our typed AST embodies typing rules. It expresses exactly the same thing, and it’s now impossible to create an Add
that doesn’t conform to its typing rule.
Gt
Here’s how the untyped version of Gt
was defined:
case Gt(
lhs: Expr,
rhs: Expr
)
The typing rule are almost the same as for $\texttt{Add}$: \begin{prooftree} \AXC{$\Gamma \vdash lhs : \texttt{Type.Num}$} \AXC{$\Gamma \vdash rhs : \texttt{Type.Num}$} \BIC{$\Gamma \vdash \texttt{Gt}\ lhs\ rhs : \texttt{Type.Bool}$} \end{prooftree}
We can follow the exact same process as for Add
and get the following typed Gt
:
case Gt(
lhs: TypedExpr[Type.Num], // Γ |- lhs : Type.Num
rhs: TypedExpr[Type.Num] // Γ |- rhs : Type.Num
) extends TypedExpr[Type.Bool] // Γ |- Gt lhs rhs : Type.Bool
Conditionals are a little more interesting. Here’s how we declared their untyped version:
case Cond(
pred: Expr,
onT : Expr,
onF : Expr
)
The typing rule for $\texttt{Cond}$ is:
\begin{prooftree} \AXC{$\Gamma \vdash pred : \texttt{Type.Bool}$} \AXC{$\Gamma \vdash onT : X$} \AXC{$\Gamma \vdash onF : X$} \TIC{$\Gamma \vdash \texttt{Cond}\ pred\ onT\ onF : X$} \end{prooftree}
That $X$ here is new. It indicates some type, we just don’t really care which. We’re using it both as a placeholder, and as a constraint: we want $onT$ and $onF$ to be of the same type.
We can simply make it a type parameter of our typed version of Cond
- one that, of course, is constrained to be a Type
.
case Cond[X <: Type](
pred: TypedExpr[Type.Bool], // Γ |- pred : Type.Bool
onT : TypedExpr[X], // Γ |- onT : X
onF : TypedExpr[X] // Γ |- onF : X
) extends TypedExpr[X] // Γ |- Cond pred onT onF : X
Note how here, again, the data and typing rules are clearly different expressions of the same thing. Our typed Cond
cannot violate its own typing rule, we know for a fact it’ll always be valid.
Bindings are where we’ll start running into real problems with our encoding. As usual, the environment makes things a lot harder.
Here’s how we defined binding introduction:
case Let(
name : String,
value: Expr,
body : Expr
)
The typing rule for that is:
\begin{prooftree} \AXC{$\Gamma \vdash value: X$} \AXC{$\Gamma[name \leftarrow X] \vdash body : Y$} \BIC{$\Gamma \vdash \texttt{Let}\ name\ value\ body : Y$} \end{prooftree}
If we follow the same process we did for $\texttt{Cond}$ (ignoring that $name \leftarrow X$ bit for the moment), we end up with something fairly convincing:
case Let[X <: Type, Y <: Type](
name : String,
value: TypedExpr[X], // Γ |- value : X
body : TypedExpr[Y] // Γ[name <- X] |- body : Y
) extends TypedExpr[Y] // Γ |- Let name value body : Y
This is where you should start to feel doubt creeping in, however. Our encoding doesn’t really have a notion of environment, so we cannot express the constraint that $name$ reference a value of type $X$ in $body$. We’ll explore this a little more once we’ve typed binding elimination.
The untyped Ref
is as simple as it gets:
case Ref(name: String)
And its typing rule is fairly straightforward as well:
\begin{prooftree} \AXC{$\Gamma \vdash \texttt{Ref}\ name : \Gamma(name)$} \end{prooftree}
You’ll recall that this means $\texttt{Ref}$ has whatever type $name$ references in our type environment, which means we’re running in exactly the same problem as for $\texttt{Let}$: we do not have a notion of a type environment! The best we can do without one is to say that Ref
has some type:
case Ref[X <: Type](
name: String
) extends TypedExpr[X] // Γ |- Ref name : Γ(name)
This allows us to describe all legal programs: X
can be any type, so it can certainly be the right one! But unfortunately, it can be all the other ones as well.
Let’s take a concrete example, with this obviously incorrect program:
let x = true in
x + 1
This clearly violates our typing rule for $\texttt{Add}$, and would never be accepted by a type checker. Whatever evaluator we’ll write to go from Expr
to TypedExpr
will deal with that scenario for us. But TypedExpr
is not yet powerful enough to prevent the manual crafting of values that represent invalid programs:
Let(
name = "x",
value = Bool(true),
body = Add(Ref("x"), Num(1))
)
TypedExpr
limitationsOur current encoding isn’t sufficient to guarantee a TypedExpr
value describes a valid program.. We will need to come up with a way of encoding our type environment at the type level to sort that particular issue, and that is a bit of a challenge, one we’ll tackle later.
We know TypedExpr
is broken whenever a type environment is required. Let me qualify that broken, however: TypedExpr
is not correct by definition: its definition allows for illegal programs. It can still be correct by construction: whatever TypedExpr
we get from the type checker we’ll eventually write will always be correct. That’s not quite as good, but TypedExpr
does improve quite a bit on Expr
: any expression that does not rely on the type environment is correct by definition, a guarantee Expr
doesn’t offer at all.
While TypedExpr
is not perfect, it still makes a large number of type errors impossible. Let’s work with that for the moment, and see if we can make it even better later.
This is how we defined function introduction:
case Fun(
param: String,
pType: Type,
body : Expr
)
And here’s the corresponding typing rule:
\begin{prooftree} \AXC{$\Gamma[param \leftarrow X] \vdash body : Y$} \UIC{$\Gamma \vdash \texttt{Fun}\ (param : X)\ body : X \to Y$} \end{prooftree}
If you pay attention to the type of $\texttt{Fun}$, you should notice we have a problem. A function’s type is defined by its input and output ($X$ and $Y$ here), but the type Type.Fun
holds no such information.
enum Type:
case Num
case Bool
case Fun(from: Type, to: Type)
Note that a value of that type does, with its from
and to
fields, but we’re currently firmly in the domain of types, not that of values. We know we’re going from one type to another, but at least at the type level, we don’t know which ones. This is a relatively easy fix however, as we can encode that as type parameters to Type.Fun
:
case Fun[X <: Type, Y <: Type](from: X, to: Y)
Note how they both must be subtypes of Type
. We absolutely want to stay within the bounds of Type
and not get something a little nonsensical, like Type.Fun[java.lang.String, scala.Int]
.
Right, so, we should now be able to adapt our untyped Fun
. There’s one last wrinkle, however: pType
. It’s a little unclear what we should do with that. If you remember, we introduced it for type checking purposes, because we had no other way of knowing the parameter’s type when checking body
. But TypedExpr
is the output of type checking, isn’t it? So intuitively, we probably don’t need pType
any longer.
While we probably don’t need the value any longer, we absolutely do need it at the type level: it’s the input type of our function. We’ll just make it a type parameter.
For the rest, we can use the same technique we’ve been using so far, carry any typing rule we can to the type level, and ignore anything to do with the type environment:
case Fun[X <: Type, Y <: Type](
param: String,
body : TypedExpr[Y] // Γ[param <- X] |- body : Y
) extends TypedExpr[X -> Y] // Γ |- Fun (param : X) body : X -> Y
And yes, since we’re ignoring the type environment, it does mean Fun
is a little broken. It can be used, for example, to represent the following program:
(x: Boolean) -> x + 1
Here’s how:
val expr: TypedExpr[Type.Bool -> Type.Num] = Fun(
param = "x",
body = Add(Ref("x"), Num(1))
)
Apply
is maybe a little bit of a let down, in that it’s absolutely obvious how to deal with it. Here’s the untyped declaration:
case Apply(
fun: Expr,
arg: Expr
)
Its typing rule does not include anything specific about the environment, which is where most of our difficulties tend to come from:
\begin{prooftree} \AXC{$\Gamma \vdash fun : X \to Y$} \AXC{$\Gamma \vdash arg : X$} \BIC{$\Gamma \vdash \texttt{Apply}\ fun\ arg : Y$} \end{prooftree}
We can simply move our rules into the typed version of Apply
:
case Apply[X <: Type, Y <: Type](
fun: TypedExpr[X -> Y], // Γ |- fun : X -> Y
arg: TypedExpr[X] // Γ |- arg : X
) extends TypedExpr[Y] // Γ |- Apply fun arg : Y
And while this was very easy to get to, I quite like how cleanly it describes the essence of function application: given an X -> Y
and an X
, you get a Y
.
We can finally tackle $\texttt{LetRec}$, which was defined in its untyped version as follows:
case LetRec(
name : String,
value: Expr,
vType: Type,
body : Expr
)
And here is its typing rule:
\begin{prooftree} \AXC{$\Gamma[name \leftarrow X] \vdash value : X$} \AXC{$\Gamma[name \leftarrow X] \vdash body : Y$} \BIC{$\Gamma \vdash \texttt{LetRec}\ name\ (value : X)\ body : Y$} \end{prooftree}
There really is nothing new for us to deal with here:
valueType
and move it to the type level.Which gives us the following typed declaration:
case LetRec[X <: Type, Y <: Type](
name : String,
value: TypedExpr[X], // Γ[name <- X] |- value : X
body : TypedExpr[Y] // Γ[name <- X] |- body : Y
) extends TypedExpr[Y] // Γ |- LetRec name (value : X) body : Y
As usual, the presence of environment constraints in the typing rule means that LetRec
is not fully safe.
What we’ve done here is come up with a typed representation of Expr
, which enforces quite a few of our typing rules by definition. Unfortunately, some of them can’t yet be enforced that way, and will need to be by construction instead.
Our next step is going to be to start writing an evaluator from Expr
to TypedExpr
.