@@ -29,8 +29,8 @@ code][profunctor], the sky's the limit!
29
29
Normal ADTs
30
30
-----------
31
31
32
- Algebraic Data Types (ADT's) are products and sums ( that's why they're
33
- algebraic, after all)
32
+ Algebraic Data Types (ADT's) are products and sums; that's why they're
33
+ algebraic, after all!
34
34
35
35
### Product Types
36
36
@@ -99,22 +99,22 @@ public class Transaction {
99
99
}
100
100
```
101
101
102
- [ ^ java ] : I didn't think I'd ever write this non-ironically on my blog
102
+ [ ^ java ] : I didn't think I'd ever write "java bean" non-ironically on my blog.
103
103
104
104
And there you go. Nothing too surprising there!
105
105
106
- Remember , not only are these ADT's (algebraic data types), they're also ADT's
107
- ( abstract data types): you are meant to work with them based on a pre-defined
108
- abstract interface based on type algebra, instead of their internal
106
+ In this case , not only are these ADT's (algebraic data types), they're also
107
+ ADT's ( ** abstract** data types): you are meant to work with them based on a
108
+ pre-defined abstract interface based on type algebra, instead of their internal
109
109
representations.
110
110
111
111
### Sum Types
112
112
113
- Alright, moving on to sum types. If your language doesn't support sum types,
114
- usually the way to go is with the _ visitor pattern _ : the underlying
115
- implementation is hidden, and the only way to process a sum type value is by
116
- providing handlers for every branch --- a pattern match as a function,
117
- essentially. Your sum values then basically determine which handler is called.
113
+ If your language doesn't support sum types, usually the way to go is with the
114
+ _ visitor pattern _ : the underlying implementation is hidden, and the only way to
115
+ process a sum type value is by providing handlers for every branch --- a
116
+ pattern match as a function, essentially. Your sum values then basically
117
+ determine which handler is called.
118
118
119
119
For example, we can implement it for a network address type that can either be
120
120
IPv4 or IPv6. Here we are using C++ just for generics and lambdas with
@@ -189,9 +189,9 @@ if we ever add a new branch, everything that ever consumes `IPAddress` with an
189
189
In a language _without_ generics or powerful enough polymorphism, it's
190
190
difficult to enforce the "pure" visitor pattern because you can't ensure that
191
191
all branches return the same type. Instead, the best you can do is have an
192
- "effectul " visitor pattern, which triggers an action on each branch instead of
192
+ "effectful " visitor pattern, which triggers an action on each branch instead of
193
193
returning a value. This is a good plan of action for languages like javascript
194
- (without typescript), or python without types.
194
+ (sans typescript), or python without types.
195
195
196
196
In languages without context-binding functions, you might also need to add a
197
197
closure-simulating context into your visitor:
@@ -240,7 +240,6 @@ void showIPAddress(const struct IPAddress* ip, char* out) {
240
240
}
241
241
```
242
242
243
-
244
243
On a simpler note, if your language as subtyping built in (maybe with classes
245
244
and subclasses) or some other form of dynamic dispatch, you can implement it in
246
245
terms of that, which is nice in python, java, C++, etc.
@@ -275,10 +274,9 @@ abstract class Expr {
275
274
```
276
275
277
276
(Note that C++ doesn't allow template virtual methods --- not because it's not
278
- possible within the language, but rather because the maintainers are too lazy
279
- to add it --- so your options are a little bit more limited there. Basically,
280
- ` accept ` is not allowed because of this. But we'll discuss a method to get
281
- around this later.)
277
+ possible within the language semantics and syntax, but rather because the
278
+ maintainers are too lazy to add it --- so doing this faithfully requires a bit
279
+ more creativity)
282
280
283
281
Now, if your language has dynamic dispatch or subclass polymorphism, you can
284
282
actually do a different encoding, instead of the tagged union. This will work
@@ -413,14 +411,18 @@ public class Main {
413
411
}
414
412
```
415
413
414
+ Passing around function references like this is actually pretty close to the
415
+ scott encoding of our data type --- and for non-recursive types, it's
416
+ essentially the church encoding.
417
+
416
418
### Recursive Types
417
419
418
- Okay well ...what if your language doesn't allow recursive data types? Or, what
419
- if recursively generated values are just annoying to deal with? Just imagine
420
- writing that ` Expr ` type in a language with explicit memory management, for
421
- example. (Alternatively, even if you can express recursive types without
422
- problems in your language, the following is actually potentially a useful way
423
- to structure your types to gain some nice benefits.)
420
+ Speaking of recursive types ...what if your language doesn't allow recursive
421
+ data types? What if it doesn't allow recursion at all, or what if recursively
422
+ generated values are just annoying to deal with? Just imagine writing that
423
+ ` Expr ` type in a language with explicit memory management, for example. Or,
424
+ what if you wanted a way to express your recursive types in a more elegant and
425
+ runtime-safe manner?
424
426
425
427
One thing you can instead do is have your visitor be in its "catamorphism", or
426
428
church encoding. Instead of having the "visitor" take the recursive
@@ -515,6 +517,47 @@ this pattern is to have alongside your normal recursive types.
515
517
[ recursion-schemes ] : https://hackage.haskell.org/package/recursion-schemes
516
518
[ prequel memes ] : https://blog.jle.im/entry/tries-with-recursion-schemes.html
517
519
520
+ This pattern is pretty portable to other languages too, as long as you can
521
+ scrounge together something like Rank-N types:
522
+
523
+ ``` java
524
+ interface ExprFold <R> {
525
+ R foldLit (int value );
526
+ R foldNegate (R unary );
527
+ R foldAdd (R left , R right );
528
+ R foldMul (R left , R right );
529
+ }
530
+
531
+ interface Expr {
532
+ public abstract <R > R accept (ExprFold<R > fold );
533
+
534
+ public static Expr lit (int value ) {
535
+ return new Expr () {
536
+ @Override
537
+ public <R > R accept (ExprFold<R > fold ) {
538
+ return fold. foldLit(value);
539
+ }
540
+ };
541
+ }
542
+
543
+ public static Expr negate (Expr unary ) {
544
+ return new Expr () {
545
+ @Override
546
+ public <R > R accept (ExprFold<R > fold ) {
547
+ return fold. foldNegate(unary. accept(fold));
548
+ }
549
+ };
550
+ }
551
+
552
+ // etc.
553
+ }
554
+ ```
555
+
556
+ By "Rank-N types" here, I mean that your objects can generate polymorphic
557
+ functions: given an ` Expr ` , you could _ generate_ an `<R > R accept(ExprFold <R >
558
+ fold)` for any ` R`, and not something pre-determined or pre-chosen by your
559
+ choice of representation of ` Expr ` .
560
+
518
561
Generalized Algebraic Data Types
519
562
--------------------------------
520
563
@@ -527,12 +570,14 @@ in and embrace the warm yet harsh embrace of ultimate type safety. Now what?
527
570
### Singletons and Witnesses
528
571
529
572
In Haskell, singletons are essentially enums used to associate a value with a
530
- reflected type. I ran into a real-world usage of this while writing
531
- < https://coronavirus.jle.im/ > , a web-based data visualizer of COVID-19 data
532
- ([ source here] [ corona-charts ] ) in purescript. I needed a singleton to represent
533
- _ scales_ for scatter plots and linking them to the data that can be plotted.
534
- And, not only did it need to be type-safe in purescript (which has ADTs but not
535
- GADTs), it had to be type-safe in the javascript ffi as well.
573
+ reifiable type. "Reifiable" here means that you can take the runtime value of a
574
+ singleton and use it to bring evidence to the type-level. I ran into a
575
+ real-world usage of this while writing < https://coronavirus.jle.im/ > , a
576
+ web-based data visualizer of COVID-19 data ([ source here] [ corona-charts ] ) in
577
+ purescript. I needed a singleton to represent _ scales_ for scatter plots and
578
+ linking them to the data that can be plotted. And, not only did it need to be
579
+ type-safe in purescript (which has ADTs but not GADTs), it had to be type-safe
580
+ in the javascript ffi as well.
536
581
537
582
[ corona-charts ] : https://github.com/mstksg/corona-charts/tree/master
538
583
@@ -575,7 +620,7 @@ The fundamental ability we want to gain is that if we pattern match on
575
620
` ScaleDate ` , then we _ know_ ` a ` has to be ` Date ` . If we match on ` NInt ` , we
576
621
know that ` a ` _ has_ to be ` Int ` .
577
622
578
- For the sake of this example, we're going to be implementing a simpler version
623
+ For the sake of this example, we're going to be implementing a simpler function
579
624
in purescript and in javascript: a function that takes a scale type and a list
580
625
of points prints the bounds. In Haskell, this looks like:
581
626
@@ -616,8 +661,8 @@ displayNumericAxis = \case
616
661
617
662
(Pretend the ` Percent ` type is just a newtype-wrapped ` Float ` or something)
618
663
619
- There are two main approaches to do this: Runtime equality witnesses and
620
- Higher-Kinded Eliminators.
664
+ There are at least two main approaches to do this. We'll be discussing runtime
665
+ equality witnesses and Higher-Kinded Eliminators.
621
666
622
667
#### Runtime Equality Witnesses
623
668
@@ -676,10 +721,11 @@ from (Leibniz f) = getOp (f (Op id))
676
721
So, if your language supports higher-kinded Rank-2 types, you have a solution!
677
722
678
723
There are other solutions in other languages, but they will usually all be
679
- language-dependent (For example, you can get something cute with C++ templates
680
- if you restrict template generation to only ` <a,a> ` ).
724
+ language-dependent.
681
725
682
- Let's write everything in purescript then:
726
+ Let's write everything in purescript. The key difference is we use `map (to
727
+ isNumber) :: Array a -> Array Number` , etc., to get our ` Array` as something we
728
+ know it has the type of.
683
729
684
730
``` purescript
685
731
import Text.Printf
@@ -732,11 +778,11 @@ displayNumericAxis = \case
732
778
}
733
779
```
734
780
735
- The main difference here is that to work with our ` [a] ` as if it were ` [Int] ` ,
736
- we have to map the coercion function over it that our ` Leibniz a Int ` gave us.
737
- Admittedly, this naive way adds a runtime cost of copying the array. But we
738
- could be more creative with finding the minimum and maximum in this way in
739
- constant space and no extra allocations.
781
+ To work with our ` [a] ` as if it were ` [Int] ` , we have to map the coercion
782
+ function over it that our ` Leibniz a Int ` gave us. Admittedly, this naive way
783
+ adds a runtime cost of copying the array. But we could be more creative with
784
+ finding the minimum and maximum in this way in constant space and no extra
785
+ allocations.
740
786
741
787
And, if we wanted to outsource this to the javascript FFI, remember that
742
788
javascript doesn't quite have sum types, so we can create a quick visitor:
@@ -800,7 +846,12 @@ data UserHandler r = UH
800
846
{ uhTheAdmin :: r
801
847
, uhMember :: Int -> r
802
848
}
849
+ ```
803
850
851
+ But note that if you have the right set of continuations, you have something
852
+ that is essentially equal to `User ` without having to actually use `User `:
853
+
854
+ ```haskell
804
855
type User' = forall r . UserHandler r -> r
805
856
806
857
fromUser :: User -> User'
@@ -814,14 +865,13 @@ toUser f = f $ UH { fhTheAdmin = TheAdmin, fhMember = Member }
814
865
815
866
This means that ` User ` is actually equivalent to `forall r. UserHandler r ->
816
867
r`: they're the same type, so if your language doesn't have sum types, you
817
- could encode it as ` forall r. UserHandler r -> r ` instead. That is essentially
818
- the visitor pattern discussed earlier.
868
+ could encode it as ` forall r. UserHandler r -> r ` instead. Visitors, baby.
819
869
820
- But, what actually does the ` r ` type variable represent here, semantically?
821
- Well, in a ` UserHandler r ` , ` r ` is the "target" that we interpret into. But
822
- there's a deeper relationship between ` r ` and ` User ` : A ` UserHandler r `
823
- essentially "embeds" a ` User ` into an ` r ` . And, a ` UserHandler r -> r ` is the
824
- application of that embedding to an actual ` User ` .
870
+ But, then, what actually does the ` r ` type variable represent here,
871
+ semantically? Well, in a ` UserHandler r ` , ` r ` is the "target" that we interpret
872
+ into. But there's a deeper relationship between ` r ` and ` User ` : A `UserHandler
873
+ r ` essentially "embeds" a ` User` into an ` r` . And, a ` UserHandler r -> r` is
874
+ the application of that embedding to an actual ` User ` .
825
875
826
876
If we pick ` r ~ () ` , then ` UserHandler () ` embeds ` User ` into ` () ` . If we pick
827
877
` r ~ String ` , then ` UserHandler () ` embeds ` User ` into ` String ` (like,
@@ -846,10 +896,10 @@ data MakeUser user = MakeUser
846
896
type User' = forall user . MakeUser user -> user
847
897
```
848
898
849
- Anyway , the `forall user. ` is chosen to faithfully let us " create" a `User `
850
- within the system we have, without actually having a `User ` data type .
851
- Essentially we can imagine the `r` in the `forall r` as " standing in" for
852
- ` User `, even if that type doesn't actually exist.
899
+ The `forall user. ` lets us faithfully " create" a `User ` within the system we
900
+ have, without actually having a `User ` data type . Essentially we can imagine
901
+ the `r` in the `forall r` as " standing in" for ` User `, even if that type
902
+ doesn't actually exist.
853
903
854
904
Now , here's the breakthrough: If we can use `forall (r :: Type )` to substitute
855
905
for `User :: Type `, how about we use a `forall (p :: Type -> Type )` to
@@ -879,12 +929,12 @@ toScale :: Scale' a -> Scale a
879
929
toScale f = f $ SH { shDate = ScaleDate , shLinear = ScaleLinear , shLog = ScaleLog }
880
930
```
881
931
882
- So in our new system, ` forall p. ScaleHandler p a -> p a ` is the same thing as
932
+ So in our new system, ` forall p. ScaleHandler p a -> p a ` is identical to
883
933
` Scale ` : we can use ` p a ` to substitute in ` Scale ` in our language even if our
884
934
language itself cannot support GADTs.
885
935
886
- So let's write this in purescript. We no longer have an actual ` Scale ` sum
887
- type, but its higher-kinded church encoding:
936
+ So let's write ` formatNType ` in purescript. We no longer have an actual ` Scale `
937
+ sum type, but its higher-kinded church encoding:
888
938
889
939
``` purescript
890
940
type NType a = forall p.
@@ -929,11 +979,9 @@ a` is an `a -> String`, which is what we wanted! The `int` field is `Op String
929
979
Int` , the ` number` field is ` Op String Number`, etc.
930
980
931
981
In many languages, using this technique effectively requires having a newtype
932
- wrapper on-hand, so it might be unwieldy in non-trivial situations. But it can
933
- be a pretty powerful method for reasons we will see soon.
934
-
935
- For example, if we wanted to write our previous axis function which is `NType a
936
- -> [ a] -> String` , we'd have to have a newtype wrapper for ` [ a] -> String` that
982
+ wrapper on-hand, so it might be unwieldy in non-trivial situations. For
983
+ example, if we wanted to write our previous axis function which is `NType a ->
984
+ [ a] -> String` , we'd have to have a newtype wrapper for ` [ a] -> String` that
937
985
has ` a ` as its argument:
938
986
939
987
``` purescript
@@ -947,7 +995,7 @@ newtype Compose f g a = Compose (f (g a))
947
995
```
948
996
949
997
and your ` p ` projection type would be ` Compose Op [] ` . So, you don't
950
- necessarily have to write a bespoke newtype wrapper, _ but _ you do have to
998
+ necessarily have to write a bespoke newtype wrapper, but you do have to
951
999
devote some brain cycles to think it through (unless you're in a language
952
1000
that doesn't need newtype wrappers to have this work, like we'll discuss
953
1001
later).
@@ -956,10 +1004,10 @@ By the way, this method generalizes well to multiple arguments: if you have a
956
1004
type like ` MyGADT a b c ` , you just need to project into a `forall (p :: k1 ->
957
1005
k2 -> k3 -> Type)`.
958
1006
959
- I believe I have read somewhere that the two methods here (runtime equality
960
- witness vs. higher-kinded eliminator) are not actually fully identical with
961
- each other , and there are GADTs where one would work and not the other ... but
962
- I can't remember where I read this and I'm also not big-brained enough to
1007
+ I believe I have read somewhere that the two methods discussed here (runtime
1008
+ equality witness vs. higher-kinded eliminator) are not actually fully identical
1009
+ in their power , and there are GADTs where one would work and not the other ...
1010
+ but I can't remember where I read this and I'm also not big-brained enough to
963
1011
figure out what those situations are. But if you, reader, have any idea, please
964
1012
let me know!
965
1013
@@ -989,12 +1037,12 @@ myFavoriteNumbers :: [SomeNType]
989
1037
myFavoriteNumbers = [SomeNType NInt 3 , SomeNType NDouble pi ]
990
1038
```
991
1039
992
- But, what if our language doesn't have existentials? Remember, this is
993
- basically a value ` SomeNType ` that _ isn't_ a Generic, but _ contains_ both a
994
- ` NType a ` and an ` a ` of the _ same_ variable.
1040
+ But what if our language doesn't have existentials? Remember, this is basically
1041
+ a value ` SomeNType ` that _ isn't_ a Generic, but _ contains_ both a ` NType a ` and
1042
+ an ` a ` of the _ same_ variable.
995
1043
996
1044
One strategy we have available is to CPS-transform our existentials into their
997
- CPS form (continuation-passing style). Basically, we write exactly what we
1045
+ CPS form (continuation-passing style form ). Basically, we write exactly what we
998
1046
want to do with our contents _ if we pattern matched_ on them. It's essentially
999
1047
a Rank-N visitor pattern with only a single constructor:
1000
1048
@@ -1035,7 +1083,8 @@ interface SomeNType {
1035
1083
}
1036
1084
}
1037
1085
1038
- // Second option: the subtype hiding a type variable
1086
+ // Second option: the subtype hiding a type variable, which you have to always
1087
+ // make sure to upcast into `SomeNType` after creating
1039
1088
class SomeNTypeImpl <A> extends SomeNType {
1040
1089
private NType<A > nt;
1041
1090
private A val;
@@ -1132,13 +1181,13 @@ let eval
1132
1181
}
1133
1182
```
1134
1183
1135
- Again, now instead of ` Add ` taking ` Expr ` , it takes ` p Natural ` : the "` Natural `
1184
+ Again, now instead of ` add ` taking ` Expr ` , it takes ` p Natural ` : the "` Natural `
1136
1185
result of the fold". ` p ` not only stands in for what we embed ` Expr ` into, it
1137
1186
stands in for the result of the recursive fold. That's why in ` eval ` , the first
1138
- arguments of ` add ` are the results of the sub-evaluation.
1187
+ arguments of ` add ` are the ` Natural ` results of the sub-evaluation.
1139
1188
1140
1189
These values can be created in the same way as before, merging the two
1141
- techniques, calling the handlers downstream:
1190
+ techniques, sending the handlers downstream:
1142
1191
1143
1192
``` dhall
1144
1193
let natLit
@@ -1204,8 +1253,8 @@ ternary :: Expr Bool -> Expr a -> Expr a -> Expr a
1204
1253
ternary b x y handlers = handlers. ternary (b handlers) (x handlers) (y handlers)
1205
1254
```
1206
1255
1207
- One nice thing about the dhall version that's incidental to dhall is that it
1208
- doesn't require any extra newtype wrappers like the Haskell one does. That's
1256
+ But one nice thing about the dhall version that's incidental to dhall is that
1257
+ it doesn't require any extra newtype wrappers like the Haskell one does. That's
1209
1258
because type inference tends to choke on things like this, but dhall doesn't
1210
1259
really have any type inference: all of the types are passed explicitly. It's
1211
1260
one of the facts about dhall that make it nice for things like this.
@@ -1215,10 +1264,9 @@ Congratulations
1215
1264
1216
1265
In any case, if you've made it this far, congratulations! You are a master of
1217
1266
ADTs and GADTs. Admittedly every language is different, and some of these
1218
- solutions have to be tweaked for the language in question. C++ not having
1219
- template virtuals, for instance, complicates a lot of our workarounds for lack
1220
- of proper rank-2 types. And, if your program gets very complicated, there is a
1221
- good chance that things will become ergonomically unfeasible.
1267
+ solutions have to be tweaked for the language in question. And, if your program
1268
+ gets very complicated, there is a good chance that things will become
1269
+ ergonomically unfeasible.
1222
1270
1223
1271
But I hope, at least, that this inspires your imagination to try to bring your
1224
1272
haskell principles, techniques, standards, practices, and brainrot into the
0 commit comments