I've been seeing some more posts about how to use and when to use GADTs. GADTs (Generalized Abstract Data Types, I pronounce them "gats" like in "cats" but with a "g") are an extension to regular ADTs that is usually reserved for very specific scenarios, but its not always clear which those are and why.
So I figured I'd give this a shot, and write a super small primer on them, by example. We're gonna write a library that didn't need to use GADTs, and along the way we're gonna feel the pains that come with GADTs, and what specific things they are amazing for.
Buckle on to your camels! 🐫
GADTs by Example
Alright usually you'd write your variant as:
type role = | Guest of guest | User of user
and use it as
Guest(guest). You can think of these constructors as little functions that go from some arguments to a value of type
role. So really
Guest "has type"
guest -> role. And
User has type
user -> role.
A GADT will let you change (a little) the return type of these constructors. So let's see what our
role example looks like with GADT syntax.
type role = | Guest: guest -> role | User: user -> role
Neat, right? I really like this syntax.
But in this case, our
role type can't be segmented or parametrized. I mean you can just have
role, it's not like an
option or a
list where you can have an
int option or
bool list and those are more specific types of
list. This means we can really only return
role in any of our constructors, so you probably don't need GADTs.
Let's look at an example that does not need GADTs but uses them anyway.
A Validation Library
We'll make a validation type that we can use to mark things as validated throughout our app:
type _ validation = | Valid: 'thing -> 'thing validation | Invalid: 'thing * string -> 'thing validation | Pending: 'thing -> 'thing validation
Note how using our constructors returns different types. If you use
Valid("hello") you get a
string validation, if you use
Invalid(2113, "not cool number") you get an
int validation, and so on.
If we try to make some helper functions they may look like this:
let make : 'thing -> 'thing validation = fun x -> Pending x let get : 'x validation -> 'x = function | Pending x -> x | Valid x -> x | Invalid (x, _) -> x let errors : 'x validation -> string option = function | Invalid (_, err) -> Some err | _ -> None let is_valid : 'x validation -> bool = function | Valid _ -> true | _ -> false
You can imagine how we'll have to check every time we want to open up a validation, to see if its pending, valid, or invalid. This means other devs will have to remember to check if the validation passed before using the value.
I don't know about you but I'm 100% certain I will forget to do that at least once.
GADTs can help us here to reduce the space of all the possible types that our variant constructors create. Right now, we can essentially create any
<type> validation by just passing the right value to the constructors. But we could change that, so that you can statically check a value has passed, is pending, or has failed validation. Our new GADT could look like this:
(* we'll make some abstract types to use for differentiating the validation states *) type pending type failed type valid (* our new validation GADT *) type _ validation = | Pending: 'thing -> pending validation | Valid: 'thing -> valid validation | Failed: 'thing * string -> failed validation
This means that our functions can have restricted type signatures and smaller implementations. The compiler now knows that there is only one possible type that matches the
Valid constructor, so we can't consider the others.
let make x = Pending x let get (Valid x) = x let errors (Fail (_, err)) = err
Great! The downside is that this code doesn't actually type-check. You can't pattern-match and get a value out of
Valid x because the compiler "forgot" what type that value had. Let me show you what I mean. This function:
let get (Valid x) = x
Fails to type with this error:
File "lib/gadt.ml", line 18, characters 20-21: 18 | let get (Valid x) = x ^ Error: This expression has type $Valid_'thing but an expression was expected of type 'a The type constructor $Valid_'thing would escape its scope
And the type of x in the error is
$Valid_'thing, which is a weird type. The compiler yells that it would escape its scope. That's how OCaml tells us "Hey I know there should be a type here but I...erhm...don't know anymore what that type *actually* was. So, yeah, can't let you use it. Sorry".
So how does one get this value out?
Turns out that while the compiler won't let this type escape, if you put many things together inside the same constructor, it will remember if the type was the same across all of them. For example, the compiler is completely happy here:
(* we introduce a function in our Valid arguments *) type 'validity validation = | Valid: 'thing * ('thing -> bool) -> valid validation | ... let get_that_bool (Valid (x, fn)) = fn x
because it understands that once you pack together a
'thing and a
'thing -> bool, then it's the same
'thing. So this will work for any type. And that's both quite powerful and also super non-obvious at first. Like, what is this useful for?
GADTs can help us hide type information, but still be able to use it later. In the last issue of Practical OCaml we created a
route type for our web router that uses this pattern to hide the types that different route-handler functions use as inputs, and it lets us put together into a single list, a bunch of routes that have type-safe parameters/body parsing. Pretty cool.
Anyways, back to our question, to get our Valid value out, we will need to either:
- let the value escape
- or turn it into a type that is known outside the GADT, like include a
'thing -> stringfunction so we can always just call that function to turn our
'thinginto a string and return a string
We want to preserve the type of our value, so we're gonna do the first. Letting our value escape means actually putting
'thing in the return type of our constructors. Like this:
type _ validation = | Valid: 'thing -> (valid * 'thing) validation | ...
That's actually all we need. Now the compiler can infer the signature of our
get function is
(valid * 'thing) validation -> 'thing, and it lets us take that
'thing out. BAM. Done.
And yet our validation solution still doesn't help us actually validate anything. We don't have a function that goes from pending to valid, or from pending to invalid. Since our
Pending variant doesn't know what a
'thing is, it also doesn't know how to validate it. We'll start there:
(* we added a `fn` to our Pending constructor *) type _ validation = | Pending: 'thing * ('thing -> bool) -> pending validation | ... let validate (Pending (x, fn)) = if fn x then Valid x else Invalid (x, "invalid value!")
We run the compiler, and see the same issue as before:
Valid x would have type
$Pending_'thing, because our Pending variant doesn't really expose it's internal
'thing type... yadda yadda...we can fix that too by letting
type _ validation = | Pending: 'thing * ('thing -> bool) -> (pending * 'thing) validation | ...
Aaaand now we run into another issue. Oof.
Valid x and
Invalid (x, "invalid value!") have different types 🙈 – this is a very common "dead end".
For now, we are going to wrap 'em up in a
result and it will push the problem to future you and me:
let validate (Pending (x, fn)) = if fn x then Ok (Valid x) else Error (Invalid (x, "invalid value!"))
So now we can use our Extremely Type-Safe Validation Lib:
let _ = let user_value = make 2113 (fun x -> x > 0) in match validate user_value with | Ok value -> let age = get value in print_int age; | Error err -> let err = errors err in print_string err;
Hopefully implementing this has shown some of the reasons why GADTs while powerful are rather painful to work with.
For completeness's sake here's the full code:
type valid type invalid type pending type _ validation = | Pending : 'thing * ('thing -> bool) -> (pending * 'thing) validation | Valid : 'thing -> (valid * 'thing) validation | Invalid : 'thing * string -> invalid validation let make x fn = Pending (x, fn) let get (Valid x) = x let errors (Invalid (_, err)) = err let validate (Pending (x, fn)) = if fn x then Ok (Valid x) else Error (Invalid (x, "invalid value!"))
Conclusion: You Don't Need GADTs
Truth is that unless you are Jane Street and need to optimize the hell out of your compact arrays, or are writing a toy λ-calculus interpreter, you're probably better off without them.
GADTs can be super useful if you need to:
- hide type information
- restrict the kind of types that can be instantiated
- have more control over the relation between the input and return type of a function
But GADTs are not only hard to pronounce, they also come with a host of problems. The ones we've seen, and some more that we haven't that need solutions with wizardly names like locally abstract types or polymorphic recursion. Learning about this is fun and great, but sometimes can get in the way of shipping without substantially improving the quality of your product or developer experience.
So stick to simpler types until you run into one of those 3 problems and I promise you you'll be a productive, happy camelid 🚀
Have you implemented typed state machines in some other ways? Have anything to add or challenge? I'd love to hear it! Join the x.com thread.
Happy Cameling! 🐫
Thanks to @patricoferris for pointing out that if you do implement the above pattern, but move outside the current module (for ex. have a submodule for your
pending types), you may run into some undecidability problems that make pattern-matching non-exhaustive. Oof, many words. The gist is, if you see a "This pattern is non-exhaustive" error, try adding a private constructor to your type-tags:
type valid = private | Valid_tag type invalid = private | Invalid_tag type pending = private | Pending_tag
For a more detailed answer, check out this OCaml forum post.