Type-Level Programming In Haskell - GADTs

By Steven Leiva

N.B. This post is tagged with Note, which means that my understanding of the topic at hand is fluid, subject to change, and I am likely to update the post time and time again as my understanding matures. The post is not meant for external consumption - you are likely to find incomplete thoughts that make sense only to me (or maybe don't make sense at all). Maybe one day my understanding will advance to the point where I can delete the note tag.

GADTs

Generalized Algebraic Data Types have a fairly intimidating name. Luckily, like most terminology, it is only intimidating because it is new. As soon as we start unpacking the reason for that name, this start to be less intimidating. So let's get started.

What the "Generalized" word in GADTs means is that we generalize over the return type of the data constructors for a data type. Let's compare a data declaration that uses GADTs and one that doesn't, so that we may see this "generalization on the return type" in action:

-- Not GADTs
data AType a = ATypeNullaryDataConstructor | ATypeUnaryDataConstructor a

-- GADTs
data MyType a where
  MyTypeNullaryDataConstructor :: MyType Int
  MyTypeUnaryDataConstructor :: a -> MyType a

In the declaration for AType, once the type variable a has been initialized, then every data constructor for AType will have the same return type. For example, in the case when a ~ Int, ATypeNullaryDataConstructor and ATypeUnaryDataConstructor must have the return type AType Int. The key here is to focus on the return type. The data constructors must return the same type once a has been initialized.

But what about MyType? Again, the key is the return type of the data constructors. They do not have to be the same! If we initialize the type variable a to Bool, MyTypeNullaryDataConstructor will have the type MyType Int, but MyTypeUnaryDataConstructor will have the type MyType Bool. In other words, the return type of the data constructor depends on the data constructor itself, modulo other type information. This is simply not possible without GADTs In the case of AType, the return type of the data constructors depends not on the data constructors themselves, but on what the type variable has been initialized to.

Why Is This Useful?

In the declaration of MyType, we focused on how GADTs lift a restriction imposed (necessarily) by the type system. In practice, GADTs are most useful when we pair the lifting of one restriction by the imposition of another. Specifically, we can restrict the type of the parameters that the data constructor accepts. Let's re-write MyType a to show an example:

data MyType a where
  MyTypeNullaryDataConstructor :: MyType Int
  MyTypeNullaryDataConstructor :: Bool -> MyType Bool

Notice how, in this new declaration of MyType, we are still making use of the ability for our data constructors to have different return types. In addition, we are restricting the type of the parameters to which MyTypeUnaryDataConstructor can be applied to. Of course, restricting the type of the values to which a function can be applied to is no large feat. We do that all the time in functional programming, and it is the reason d'etre of types to begin with. Nevertheless, it is clear that one of the guarantees that we get by restricting the type of the paramters to the data constructors is that if we have a value of type MyType Bool, then we are guaranteed that the data constructor MyTypeUnaryDataConstructor must have been applied to a value of type Bool. But, with GADTs, the relationship also holds the other way around. If we have a value "tagged" with the data constructor MyTypeUnaryDataConstructor, then we are guaranteed to be dealing with a the type MyType Bool!. To quote the GHC User Guide:

The key point about GADTs is that pattern matching causes type refinement.

patternMatch :: MyType a -> a
patternMatch (MyTypeUnaryDataConstructor b) = b

In the above code, by inference, the compiler initializes a to Bool if the MyTypeUnaryDataConstructor pattern matches!

And just to round out the discussion, let's explore why it is impossible to determine the type of a value from its data constructor without GADTs:

data NoGADT a = NoGADT Int

a :: NoGADT Bool
b = NoGADT 5

b :: NoGADT Int
b = NoGADT 5

c :: NoGADT String
b = NoGADT 5

Notice that at the value level, we have the exact same data: NoGADT 5. However, in each case, we have a different type - NoGADT Bool, NoGADT Int, and NoGADT String. This is why, without GADTs, we cannot determine from the data constructor what the type is. If we can't do it, neither can Haskell's type system. Now, try to construct a value using the data constructor MyTypeUnaryDataConstructor and have it not result in a value of type MyType Bool. It is impossible!

Information Flow Between Layers

In my post on phantm types, I continually emphasized the importance of understanding how information flows from one layer of the type system to another. In that post, we learned three things:

Information can be encoded at the type layer; one way to do so is with phantom types
Typeclasses allow us to translate type-level information to the term layer
We can use type annotations to send information from the term layer to the type layer

GADTs give us a new tool by which to communicate from the term layer to the type layer. The data constructors at the term layer can be used to determine the type (at the type layer).

The ability of GADTs to convey information from the term layer - via the data constructors - to the type layer is how they change the game!

Misc

When writing this post, the ability to send information from the term layer to the type layer finally dawned on me. What happened was that I was going to extoll the fact that when we loosen the restriction on the return type and add restrictions on the type of the parameter to a GADT, we get a guarantee of the type of the value to which a data constructor was applied. It dawned on me that this was true even without GADTs, but that GADTs made possible to infer things the other way around - given a data constructor, we have a guarantee of the type.