Variants are monads

12 Nov 2021

Recently I’ve been working on a statically typed, pure functional programming language called Lune. Lune’s syntax is similar to ML-family languages such as OCaml and Haskell. But Lune does not have ML-style algebraic data types. Instead, it uses a form of row polymorphism based on these two papers:

A row is a list of label-type pairs. Rows are implemented in Lune with two type constructors: nil :: Row, and (:=) :: Label -> Type -> Row. The standard Prelude module also includes a type operator (;), defined as (;) f t = f t. The labels in a row can be any identifier starting with a capital letter. Here are some examples:

  • nil
  • X := float; nil
  • X := float; Y := float; nil
  • Name := string; Age := int; nil
  • Name := string; Name := string; Age := int; nil

As you can see in the last example, the labels in a row do not have to be unique. There are very few cases where duplicate labels are useful, but allowing duplicate labels greatly simplifies the type system. Rows also have a special unification rule that allows labels to be swapped. Thus N := int; S := string; nil is equivalent to S := string; N := int; nil.

Because rows have the kind Row rather than the kind Type, it is impossible to construct a value with a row as its type. There are two ways of transforming rows into normal types: by surrounding them in curly brackets, or by surrounding them in square brackets. Curly brackets are used to define record types, and square brackets are used to define variant types. In this post I will focus on the latter.

Variants represent a choice between multiple types, each with a different label attached to them. For example, suppose there is a function that returns an integer labeled with Success if it succeeds, or an error message labeled with Error if it fails. The return type of this function would be [Success := int; Error := string; nil].

In Lune, labels are first-class values. This means they can be passed as arguments and returned from functions. The standard Prelude defines a type constructor label :: Label -> Type. The type of a term-level label x is label x. For example, Name has the type label Name. This is what makes labels so powerful; unlike strings, they can be differentiated at the type level.

There are three basic operations on variants: injection, embedding, and deconstruction. These are represented in Lune with the functions (^), embed, and match respectively.

  • The (^) operator has the type any s a r. label s -> a -> [s := a; r]. For example, Success ^ 5 has the type [Success := int; r], where r can be any row. Basically, what you are doing when you write Success ^ 5 is labelling the value 5 with the label Success.

  • The existence of variants allows for subtypes. For example, [Message := string; nil] is a subtype of [Message := string; Number := int; nil]. Every value of the type [Message := string; nil], such as Message ^ "hello" and Message ^ "abc", is also a value of the type [Message := string; Number := int; nil]. However, the Lune type system, like every other ML type system, does not implicitly coerce between values of different monotypes. This means that if you have a function f that returns a value of type [Message := string; nil], and a function g that expects a value of type [Message := string; Number := int; nil], you cannot legally compose f with g. This is where label embedding comes in. The standard Prelude defines a function called embed with the type any s a r. label s -> [r] -> [s := a; r]. This allows you to explicitly coerce a value of any variant type to any of its supertypes. Instead of writing g (f x), which does not type-check, you can write g (embed Number (f x)).

  • The match function allows you to extract a value from a label. It has the type any s a b r. label s -> (a -> b) -> ([r] -> b) -> [s := a; r] -> b. For example, suppose you want to convert a value of the type [Success := int; Error := string; nil] into an integer. If the value is Success ^ x, you want to return x, but if the value is an error, you want to return 0. Such a function can be defined with match by writing match Success (do x. x) (do x. 0). (In Lune, anonymous functions are created with the do keyword.) The standard Prelude exports a function called else, which is defined as else x y = x. This allows you to make pattern-matching expressions more readable. For instance, the above function could be written match Success (do x. x) (else 0). This is roughly equivalent to the Haskell expression \case Success x -> x; _ -> 0.

The promotion of labels to first class allows you to define functions that work on any variant type. This is an enormous improvement on other ML languages. For example, you can define a function called case that does the same thing as match, but ignores the value inside the label.

val case :: any s a b r. label s -> b -> ([r] -> b) -> [s := a; r] -> b
let case s x = match s (do y. x)

But this is only the tip of the iceberg. It turns out that [s := a; r] is a monad for any label s and any row r. And with first-class labels, it is possible to define a monadic operation that works on all variant types.

val map :: any s a b r. label s -> (a -> b) -> [s := a; r] -> [s := b; r]
let map s f = match s (do x. s ^ f x) (embed s)

val apply :: any s a b r. label s -> [s := (a -> b); r] -> [s := a; r] -> [s := b; r]
let apply s vf vx = vf # match s (do f. map s f vx) (embed s)

val bind :: any s a b r. label s -> (a -> [s := b; r]) -> [s := a; r] -> [s := b; r]
let bind s f = match s f (embed s)

These higher-order functions are equivalent to Haskell’s fmap, (<*>), and (=<<), except that they each take a label as a parameter; the label to map the given function over. In Haskell terminology, I have defined a Monad instance for every single sum type at once. This is not remotely possible in any ML language. In my opinion, Lune’s combination of row polymorphism and first-class labels almost makes up for its lack of typeclasses.