Skip to content

Commit c3b8bfa

Browse files
committed
more updates
1 parent a2bcf82 commit c3b8bfa

File tree

1 file changed

+127
-79
lines changed

1 file changed

+127
-79
lines changed

copy/entries/faking-adts-and-gadts.md

+127-79
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ code][profunctor], the sky's the limit!
2929
Normal ADTs
3030
-----------
3131

32-
Algebraic Data Types (ADT's) are products and sums (that's why they're
33-
algebraic, after all)
32+
Algebraic Data Types (ADT's) are products and sums; that's why they're
33+
algebraic, after all!
3434

3535
### Product Types
3636

@@ -99,22 +99,22 @@ public class Transaction {
9999
}
100100
```
101101

102-
[^java]: I didn't think I'd ever write this non-ironically on my blog
102+
[^java]: I didn't think I'd ever write "java bean" non-ironically on my blog.
103103

104104
And there you go. Nothing too surprising there!
105105

106-
Remember, not only are these ADT's (algebraic data types), they're also ADT's
107-
(abstract data types): you are meant to work with them based on a pre-defined
108-
abstract interface based on type algebra, instead of their internal
106+
In this case, not only are these ADT's (algebraic data types), they're also
107+
ADT's (**abstract** data types): you are meant to work with them based on a
108+
pre-defined abstract interface based on type algebra, instead of their internal
109109
representations.
110110

111111
### Sum Types
112112

113-
Alright, moving on to sum types. If your language doesn't support sum types,
114-
usually the way to go is with the _visitor pattern_: the underlying
115-
implementation is hidden, and the only way to process a sum type value is by
116-
providing handlers for every branch --- a pattern match as a function,
117-
essentially. Your sum values then basically determine which handler is called.
113+
If your language doesn't support sum types, usually the way to go is with the
114+
_visitor pattern_: the underlying implementation is hidden, and the only way to
115+
process a sum type value is by providing handlers for every branch --- a
116+
pattern match as a function, essentially. Your sum values then basically
117+
determine which handler is called.
118118

119119
For example, we can implement it for a network address type that can either be
120120
IPv4 or IPv6. Here we are using C++ just for generics and lambdas with
@@ -189,9 +189,9 @@ if we ever add a new branch, everything that ever consumes `IPAddress` with an
189189
In a language _without_ generics or powerful enough polymorphism, it's
190190
difficult to enforce the "pure" visitor pattern because you can't ensure that
191191
all branches return the same type. Instead, the best you can do is have an
192-
"effectul" visitor pattern, which triggers an action on each branch instead of
192+
"effectful" visitor pattern, which triggers an action on each branch instead of
193193
returning a value. This is a good plan of action for languages like javascript
194-
(without typescript), or python without types.
194+
(sans typescript), or python without types.
195195
196196
In languages without context-binding functions, you might also need to add a
197197
closure-simulating context into your visitor:
@@ -240,7 +240,6 @@ void showIPAddress(const struct IPAddress* ip, char* out) {
240240
}
241241
```
242242
243-
244243
On a simpler note, if your language as subtyping built in (maybe with classes
245244
and subclasses) or some other form of dynamic dispatch, you can implement it in
246245
terms of that, which is nice in python, java, C++, etc.
@@ -275,10 +274,9 @@ abstract class Expr {
275274
```
276275

277276
(Note that C++ doesn't allow template virtual methods --- not because it's not
278-
possible within the language, but rather because the maintainers are too lazy
279-
to add it --- so your options are a little bit more limited there. Basically,
280-
`accept` is not allowed because of this. But we'll discuss a method to get
281-
around this later.)
277+
possible within the language semantics and syntax, but rather because the
278+
maintainers are too lazy to add it --- so doing this faithfully requires a bit
279+
more creativity)
282280

283281
Now, if your language has dynamic dispatch or subclass polymorphism, you can
284282
actually do a different encoding, instead of the tagged union. This will work
@@ -413,14 +411,18 @@ public class Main {
413411
}
414412
```
415413

414+
Passing around function references like this is actually pretty close to the
415+
scott encoding of our data type --- and for non-recursive types, it's
416+
essentially the church encoding.
417+
416418
### Recursive Types
417419

418-
Okay well...what if your language doesn't allow recursive data types? Or, what
419-
if recursively generated values are just annoying to deal with? Just imagine
420-
writing that `Expr` type in a language with explicit memory management, for
421-
example. (Alternatively, even if you can express recursive types without
422-
problems in your language, the following is actually potentially a useful way
423-
to structure your types to gain some nice benefits.)
420+
Speaking of recursive types...what if your language doesn't allow recursive
421+
data types? What if it doesn't allow recursion at all, or what if recursively
422+
generated values are just annoying to deal with? Just imagine writing that
423+
`Expr` type in a language with explicit memory management, for example. Or,
424+
what if you wanted a way to express your recursive types in a more elegant and
425+
runtime-safe manner?
424426

425427
One thing you can instead do is have your visitor be in its "catamorphism", or
426428
church encoding. Instead of having the "visitor" take the recursive
@@ -515,6 +517,47 @@ this pattern is to have alongside your normal recursive types.
515517
[recursion-schemes]: https://hackage.haskell.org/package/recursion-schemes
516518
[prequel memes]: https://blog.jle.im/entry/tries-with-recursion-schemes.html
517519

520+
This pattern is pretty portable to other languages too, as long as you can
521+
scrounge together something like Rank-N types:
522+
523+
```java
524+
interface ExprFold<R> {
525+
R foldLit(int value);
526+
R foldNegate(R unary);
527+
R foldAdd(R left, R right);
528+
R foldMul(R left, R right);
529+
}
530+
531+
interface Expr {
532+
public abstract <R> R accept(ExprFold<R> fold);
533+
534+
public static Expr lit(int value) {
535+
return new Expr() {
536+
@Override
537+
public <R> R accept(ExprFold<R> fold) {
538+
return fold.foldLit(value);
539+
}
540+
};
541+
}
542+
543+
public static Expr negate(Expr unary) {
544+
return new Expr() {
545+
@Override
546+
public <R> R accept(ExprFold<R> fold) {
547+
return fold.foldNegate(unary.accept(fold));
548+
}
549+
};
550+
}
551+
552+
// etc.
553+
}
554+
```
555+
556+
By "Rank-N types" here, I mean that your objects can generate polymorphic
557+
functions: given an `Expr`, you could _generate_ an `<R> R accept(ExprFold <R>
558+
fold)` for any `R`, and not something pre-determined or pre-chosen by your
559+
choice of representation of `Expr`.
560+
518561
Generalized Algebraic Data Types
519562
--------------------------------
520563

@@ -527,12 +570,14 @@ in and embrace the warm yet harsh embrace of ultimate type safety. Now what?
527570
### Singletons and Witnesses
528571

529572
In Haskell, singletons are essentially enums used to associate a value with a
530-
reflected type. I ran into a real-world usage of this while writing
531-
<https://coronavirus.jle.im/>, a web-based data visualizer of COVID-19 data
532-
([source here][corona-charts]) in purescript. I needed a singleton to represent
533-
_scales_ for scatter plots and linking them to the data that can be plotted.
534-
And, not only did it need to be type-safe in purescript (which has ADTs but not
535-
GADTs), it had to be type-safe in the javascript ffi as well.
573+
reifiable type. "Reifiable" here means that you can take the runtime value of a
574+
singleton and use it to bring evidence to the type-level. I ran into a
575+
real-world usage of this while writing <https://coronavirus.jle.im/>, a
576+
web-based data visualizer of COVID-19 data ([source here][corona-charts]) in
577+
purescript. I needed a singleton to represent _scales_ for scatter plots and
578+
linking them to the data that can be plotted. And, not only did it need to be
579+
type-safe in purescript (which has ADTs but not GADTs), it had to be type-safe
580+
in the javascript ffi as well.
536581

537582
[corona-charts]: https://github.com/mstksg/corona-charts/tree/master
538583

@@ -575,7 +620,7 @@ The fundamental ability we want to gain is that if we pattern match on
575620
`ScaleDate`, then we _know_ `a` has to be `Date`. If we match on `NInt`, we
576621
know that `a` _has_ to be `Int`.
577622

578-
For the sake of this example, we're going to be implementing a simpler version
623+
For the sake of this example, we're going to be implementing a simpler function
579624
in purescript and in javascript: a function that takes a scale type and a list
580625
of points prints the bounds. In Haskell, this looks like:
581626

@@ -616,8 +661,8 @@ displayNumericAxis = \case
616661

617662
(Pretend the `Percent` type is just a newtype-wrapped `Float` or something)
618663

619-
There are two main approaches to do this: Runtime equality witnesses and
620-
Higher-Kinded Eliminators.
664+
There are at least two main approaches to do this. We'll be discussing runtime
665+
equality witnesses and Higher-Kinded Eliminators.
621666

622667
#### Runtime Equality Witnesses
623668

@@ -676,10 +721,11 @@ from (Leibniz f) = getOp (f (Op id))
676721
So, if your language supports higher-kinded Rank-2 types, you have a solution!
677722

678723
There are other solutions in other languages, but they will usually all be
679-
language-dependent (For example, you can get something cute with C++ templates
680-
if you restrict template generation to only `<a,a>`).
724+
language-dependent.
681725

682-
Let's write everything in purescript then:
726+
Let's write everything in purescript. The key difference is we use `map (to
727+
isNumber) :: Array a -> Array Number`, etc., to get our `Array` as something we
728+
know it has the type of.
683729

684730
```purescript
685731
import Text.Printf
@@ -732,11 +778,11 @@ displayNumericAxis = \case
732778
}
733779
```
734780

735-
The main difference here is that to work with our `[a]` as if it were `[Int]`,
736-
we have to map the coercion function over it that our `Leibniz a Int` gave us.
737-
Admittedly, this naive way adds a runtime cost of copying the array. But we
738-
could be more creative with finding the minimum and maximum in this way in
739-
constant space and no extra allocations.
781+
To work with our `[a]` as if it were `[Int]`, we have to map the coercion
782+
function over it that our `Leibniz a Int` gave us. Admittedly, this naive way
783+
adds a runtime cost of copying the array. But we could be more creative with
784+
finding the minimum and maximum in this way in constant space and no extra
785+
allocations.
740786

741787
And, if we wanted to outsource this to the javascript FFI, remember that
742788
javascript doesn't quite have sum types, so we can create a quick visitor:
@@ -800,7 +846,12 @@ data UserHandler r = UH
800846
{ uhTheAdmin :: r
801847
, uhMember :: Int -> r
802848
}
849+
```
803850

851+
But note that if you have the right set of continuations, you have something
852+
that is essentially equal to `User` without having to actually use `User`:
853+
854+
```haskell
804855
type User' = forall r. UserHandler r -> r
805856

806857
fromUser :: User -> User'
@@ -814,14 +865,13 @@ toUser f = f $ UH { fhTheAdmin = TheAdmin, fhMember = Member }
814865

815866
This means that `User` is actually equivalent to `forall r. UserHandler r ->
816867
r`: they're the same type, so if your language doesn't have sum types, you
817-
could encode it as `forall r. UserHandler r -> r` instead. That is essentially
818-
the visitor pattern discussed earlier.
868+
could encode it as `forall r. UserHandler r -> r` instead. Visitors, baby.
819869

820-
But, what actually does the `r` type variable represent here, semantically?
821-
Well, in a `UserHandler r`, `r` is the "target" that we interpret into. But
822-
there's a deeper relationship between `r` and `User`: A `UserHandler r`
823-
essentially "embeds" a `User` into an `r`. And, a `UserHandler r -> r` is the
824-
application of that embedding to an actual `User`.
870+
But, then, what actually does the `r` type variable represent here,
871+
semantically? Well, in a `UserHandler r`, `r` is the "target" that we interpret
872+
into. But there's a deeper relationship between `r` and `User`: A `UserHandler
873+
r` essentially "embeds" a `User` into an `r`. And, a `UserHandler r -> r` is
874+
the application of that embedding to an actual `User`.
825875

826876
If we pick `r ~ ()`, then `UserHandler ()` embeds `User` into `()`. If we pick
827877
`r ~ String`, then `UserHandler ()` embeds `User` into `String` (like,
@@ -846,10 +896,10 @@ data MakeUser user = MakeUser
846896
type User' = forall user. MakeUser user -> user
847897
```
848898

849-
Anyway, the `forall user.` is chosen to faithfully let us "create" a `User`
850-
within the system we have, without actually having a `User` data type.
851-
Essentially we can imagine the `r` in the `forall r` as "standing in" for
852-
`User`, even if that type doesn't actually exist.
899+
The `forall user.` lets us faithfully "create" a `User` within the system we
900+
have, without actually having a `User` data type. Essentially we can imagine
901+
the `r` in the `forall r` as "standing in" for `User`, even if that type
902+
doesn't actually exist.
853903

854904
Now, here's the breakthrough: If we can use `forall (r :: Type)` to substitute
855905
for `User :: Type`, how about we use a `forall (p :: Type -> Type)` to
@@ -879,12 +929,12 @@ toScale :: Scale' a -> Scale a
879929
toScale f = f $ SH { shDate = ScaleDate, shLinear = ScaleLinear, shLog = ScaleLog }
880930
```
881931

882-
So in our new system, `forall p. ScaleHandler p a -> p a` is the same thing as
932+
So in our new system, `forall p. ScaleHandler p a -> p a` is identical to
883933
`Scale`: we can use `p a` to substitute in `Scale` in our language even if our
884934
language itself cannot support GADTs.
885935

886-
So let's write this in purescript. We no longer have an actual `Scale` sum
887-
type, but its higher-kinded church encoding:
936+
So let's write `formatNType` in purescript. We no longer have an actual `Scale`
937+
sum type, but its higher-kinded church encoding:
888938

889939
```purescript
890940
type NType a = forall p.
@@ -929,11 +979,9 @@ a` is an `a -> String`, which is what we wanted! The `int` field is `Op String
929979
Int`, the `number` field is `Op String Number`, etc.
930980

931981
In many languages, using this technique effectively requires having a newtype
932-
wrapper on-hand, so it might be unwieldy in non-trivial situations. But it can
933-
be a pretty powerful method for reasons we will see soon.
934-
935-
For example, if we wanted to write our previous axis function which is `NType a
936-
-> [a] -> String`, we'd have to have a newtype wrapper for `[a] -> String` that
982+
wrapper on-hand, so it might be unwieldy in non-trivial situations. For
983+
example, if we wanted to write our previous axis function which is `NType a ->
984+
[a] -> String`, we'd have to have a newtype wrapper for `[a] -> String` that
937985
has `a` as its argument:
938986

939987
```purescript
@@ -947,7 +995,7 @@ newtype Compose f g a = Compose (f (g a))
947995
```
948996

949997
and your `p` projection type would be `Compose Op []`. So, you don't
950-
necessarily have to write a bespoke newtype wrapper, _but_ you do have to
998+
necessarily have to write a bespoke newtype wrapper, but you do have to
951999
devote some brain cycles to think it through (unless you're in a language
9521000
that doesn't need newtype wrappers to have this work, like we'll discuss
9531001
later).
@@ -956,10 +1004,10 @@ By the way, this method generalizes well to multiple arguments: if you have a
9561004
type like `MyGADT a b c`, you just need to project into a `forall (p :: k1 ->
9571005
k2 -> k3 -> Type)`.
9581006

959-
I believe I have read somewhere that the two methods here (runtime equality
960-
witness vs. higher-kinded eliminator) are not actually fully identical with
961-
each other, and there are GADTs where one would work and not the other ... but
962-
I can't remember where I read this and I'm also not big-brained enough to
1007+
I believe I have read somewhere that the two methods discussed here (runtime
1008+
equality witness vs. higher-kinded eliminator) are not actually fully identical
1009+
in their power, and there are GADTs where one would work and not the other ...
1010+
but I can't remember where I read this and I'm also not big-brained enough to
9631011
figure out what those situations are. But if you, reader, have any idea, please
9641012
let me know!
9651013

@@ -989,12 +1037,12 @@ myFavoriteNumbers :: [SomeNType]
9891037
myFavoriteNumbers = [SomeNType NInt 3, SomeNType NDouble pi]
9901038
```
9911039

992-
But, what if our language doesn't have existentials? Remember, this is
993-
basically a value `SomeNType` that _isn't_ a Generic, but _contains_ both a
994-
`NType a` and an `a` of the _same_ variable.
1040+
But what if our language doesn't have existentials? Remember, this is basically
1041+
a value `SomeNType` that _isn't_ a Generic, but _contains_ both a `NType a` and
1042+
an `a` of the _same_ variable.
9951043

9961044
One strategy we have available is to CPS-transform our existentials into their
997-
CPS form (continuation-passing style). Basically, we write exactly what we
1045+
CPS form (continuation-passing style form). Basically, we write exactly what we
9981046
want to do with our contents _if we pattern matched_ on them. It's essentially
9991047
a Rank-N visitor pattern with only a single constructor:
10001048

@@ -1035,7 +1083,8 @@ interface SomeNType {
10351083
}
10361084
}
10371085

1038-
// Second option: the subtype hiding a type variable
1086+
// Second option: the subtype hiding a type variable, which you have to always
1087+
// make sure to upcast into `SomeNType` after creating
10391088
class SomeNTypeImpl<A> extends SomeNType {
10401089
private NType<A> nt;
10411090
private A val;
@@ -1132,13 +1181,13 @@ let eval
11321181
}
11331182
```
11341183

1135-
Again, now instead of `Add` taking `Expr`, it takes `p Natural`: the "`Natural`
1184+
Again, now instead of `add` taking `Expr`, it takes `p Natural`: the "`Natural`
11361185
result of the fold". `p` not only stands in for what we embed `Expr` into, it
11371186
stands in for the result of the recursive fold. That's why in `eval`, the first
1138-
arguments of `add` are the results of the sub-evaluation.
1187+
arguments of `add` are the `Natural` results of the sub-evaluation.
11391188

11401189
These values can be created in the same way as before, merging the two
1141-
techniques, calling the handlers downstream:
1190+
techniques, sending the handlers downstream:
11421191

11431192
```dhall
11441193
let natLit
@@ -1204,8 +1253,8 @@ ternary :: Expr Bool -> Expr a -> Expr a -> Expr a
12041253
ternary b x y handlers = handlers.ternary (b handlers) (x handlers) (y handlers)
12051254
```
12061255

1207-
One nice thing about the dhall version that's incidental to dhall is that it
1208-
doesn't require any extra newtype wrappers like the Haskell one does. That's
1256+
But one nice thing about the dhall version that's incidental to dhall is that
1257+
it doesn't require any extra newtype wrappers like the Haskell one does. That's
12091258
because type inference tends to choke on things like this, but dhall doesn't
12101259
really have any type inference: all of the types are passed explicitly. It's
12111260
one of the facts about dhall that make it nice for things like this.
@@ -1215,10 +1264,9 @@ Congratulations
12151264

12161265
In any case, if you've made it this far, congratulations! You are a master of
12171266
ADTs and GADTs. Admittedly every language is different, and some of these
1218-
solutions have to be tweaked for the language in question. C++ not having
1219-
template virtuals, for instance, complicates a lot of our workarounds for lack
1220-
of proper rank-2 types. And, if your program gets very complicated, there is a
1221-
good chance that things will become ergonomically unfeasible.
1267+
solutions have to be tweaked for the language in question. And, if your program
1268+
gets very complicated, there is a good chance that things will become
1269+
ergonomically unfeasible.
12221270

12231271
But I hope, at least, that this inspires your imagination to try to bring your
12241272
haskell principles, techniques, standards, practices, and brainrot into the

0 commit comments

Comments
 (0)