The Lune type system
8 Sep 2021
At the moment, I’m working on a compiler for a purely functional language called Lune. The parser and code generator are finished, but the type checker is still incomplete. The type system I plan to implement in Lune is very different from that of other functional languages. Most notably, it allows for subtypes. As a result, every expression can be assigned multiple types, with some more general than others.
There are a few built-in types:
number
is the set of real numbers. In practice, these are stored as double-precision floating-point values.int
is the set of integers. It is a subset ofnumber
.string
is the set of strings. String literals are written with double quotes.char
is the set of single-character strings. It is a subset ofstring
.atom
is the set of atoms. An atom is any word beginning with a capital letter, e.g.Hello
. (All identifiers must start with a lowercase letter in Lune, so this doesn’t create any ambiguity.) Atoms are not strictly necessary—you could use strings instead—but the use case for atoms is completely different from that of strings, so I think they deserve a separate type.- There are an infinite number of “literal types” inhabited by a single value. For example,
3
is a set containing just the number three,"owo"
is a set containing just the string"owo"
, andNull
is a set containing just the atomNull
. any
is the supertype of all types. Every expression can be given the typeany
.void
is an uninhabited “bottom type”. Since it contains no values, it is a subtype of all types.
There are also a few operators that act on types:
a -> b
is the type of a function with domaina
and rangeb
. For example, theround
function would have the typenumber -> int
.a | b
denotes the union of setsa
andb
.a, b
denotes the set of ordered pairs of elements ofa
and elements ofb
.
Universal quantification is denoted with square brackets. For example, the type of the identity function is [a] a -> a
. (You could also annotate it as any -> any
, but in that case, you’d hardly be able to do anything with the returned value, because its type would be way too general.) I’m probably going to disallow higher-ranked types for the sake of simplicity.
Type synonyms are created with the type
keyword. For example, type bit = 0 | 1
assigns the synonym bit
to the set with two elements: 0 and 1. You can also use the type
keyword to create type-level functions which take sets as arguments. For example, a maybe
type could be defined as type maybe a = Null | Just, a
. This means that a value of type maybe int
can be either the atom Null
or the atom Just
combined with an integer. Type synonyms can also be recursive, allowing you to define things such as linked lists and Peano-style natural numbers.
As I said earlier, the existence of subtypes implies that many values can be assigned more than one type. In fact, every value has an infinite number of types associated with it. For example, the string "x"
is a member of the following sets:
"x"
char
string
any
"x" | a
char | a
string | a
where a
can be any type.
I haven’t figured out all the details of the type-checking algorithm yet. Type inference is going to be a major issue, because every function can have a wide variety of possible types. I’m going to have to create an algorithm which infers the “most reasonable” type for every function, but what exactly is the definition of “most reasonable”? I’ll post the solution here once I figure it all out.