Skip to content

Commit 1336e5f

Browse files
authored
B2t2 (#1855)
2 parents 09b27d1 + f03ba3d commit 1336e5f

File tree

50 files changed

+88201
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+88201
-1
lines changed

dune-project

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
(>= 3.12.0))
3434
ppx_yojson_conv_lib
3535
ppx_yojson_conv
36+
ppx_blob
3637
incr_dom
3738
bisect_ppx
3839
(omd

hazel.opam

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

hazel.opam.locked

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/b2t2/Datasheet.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
## Reference
2+
3+
> Q. Where can we learn about the programming medium covered by this datasheet?
4+
> (Feel free to link to multiple kinds of artifacts: repositories, papers, videos, etc.
5+
> Please also include version information where applicable.)
6+
7+
- **Website**: http://hazel.org
8+
- **Source Code**: https://github.com/hazelgrove/hazel
9+
- **App**: https://hazel.org/build/dev/
10+
11+
> Q. What is the URL of the version of the benchmark being used?
12+
https://github.com/brownplt/B2T2/blob/fd227efadf532a20aefd25c7a8580978c2d684a2/Datasheet.md
13+
14+
15+
> Q. On what date was this version of the datasheet last updated?
16+
2025-11-05
17+
18+
> Q. If you are not using the latest benchmark available on that date, please explain why not.
19+
Yes
20+
21+
## Example Tables
22+
23+
> Q. Do tables express heterogeneous data, or must data be homogenized?
24+
Hazel tables are represented as *lists of labeled tuples*.
25+
- Columns may be heterogeneously typed.
26+
- Rows must be homogeneously typed.
27+
- The unknown type allows some degree of heterogenous rows.
28+
29+
> Q. Do tables capture missing data and, if so, how? Do missing values affect the output constraints of any operations,
30+
for example `groupBy`?
31+
- Represented via `Option` types (`Some` / `None`)
32+
- Incomplete programs can use expression holes (holes are not programmatically discernible)
33+
- No special handling in operations — `Option` values are ordinary
34+
35+
> Q. Are mutable tables supported? Are there any limitations?
36+
Mutable tables are not supported
37+
38+
> You may reference, instead of duplicating, the responses to the above questions in answering those below:
39+
40+
> Q. Which tables are inexpressible? Why?
41+
42+
None — all tables can be expressed using `Option` types for missing data
43+
44+
> Q. Which tables are only partially expressible? Why, and what’s missing?
45+
46+
N/A
47+
48+
> Q. Which tables’ expressibility is unknown? Why?
49+
50+
N/A
51+
52+
> Q. Which tables can be expressed more precisely than in the benchmark? How?
53+
54+
None - hazel represents the tables as precisely as the benchmark. Once again explicit option types make optional
55+
columns explicit.
56+
57+
> Q. How direct is the mapping from the tables in the benchmark to representations in your system? How complex
58+
is the encoding?
59+
60+
- Very direct
61+
- Benchmark tables map naturally to Hazel's `List of Labeled Tuples`
62+
- Missing values use `Option`
63+
- Nested tables use nested labeled tuples or lists
64+
65+
## TableAPI
66+
67+
> Q. Are there consistent changes made to the way the operations are represented?
68+
The operations are mostly presented as depicted, but here are a few variations:
69+
- Some operations utilize explicity polymorphism in Hazel using the `typfun` keyword to require explicit type
70+
application as implicit polymorphism has not been added to Hazel as of 2025-07-08
71+
- Hazel tables are represented using lists of labeled tuples so there is no runtime schema available for operations.
72+
For certain operations, such as `leftJoin`, this requires looking at the head element to determine the schema and
73+
give some behavior in the event no such element exists.
74+
- Certain operations have been made to return an optional value rather than an error
75+
- Hazel does not have first-class labels, and therefore uses strings for columns for some of the operations.
76+
If the operation was done inline primitive operators could be used to recover typesafety.
77+
78+
> Q. Which operations are entirely inexpressible? Why?
79+
All the operations are at least partially expressible.
80+
81+
> Q. Which operations are only partially expressible? Why, and what’s missing?
82+
- `leftJoin` can only build the resulting columns if both tables have at least one row to determine the schema
83+
- Various operations only work if there's at least one row to determine the schema
84+
- ncols, header
85+
- `dropna` only works if every column in a table is optional since there's no way to dynamically dispatch based off of
86+
column sort.
87+
88+
> Q. Which operations’ expressibility is unknown? Why?
89+
N/A
90+
91+
> Q. Which operations can be expressed more precisely than in the benchmark? How?
92+
- Several operations could be expressed in a more typesafe manner if a projection function was passed instead of a
93+
column name.
94+
- e.g. `selectColumn(table, fun e -> e.name)` as opposed to `selectColumn(table, `name`)`
95+
96+
## Example Programs
97+
98+
> Q. Which examples are inexpressible? Why?
99+
- sampleRows is inexpressible as Hazel is pure
100+
101+
102+
> Q. Which examples’ expressibility is unknown? Why?
103+
N/A
104+
105+
> Q. Which examples, or aspects thereof, can be expressed especially precisely? How?
106+
The examples are expressed as precisely as the benchmark
107+
108+
> Q. How direct is the mapping from the pseudocode in the benchmark to representations in your system? How complex is
109+
the encoding?
110+
- The mapping is quite direct as implemented. A less direct mapping could accomplish a more type-safe translation of
111+
several of the programs.
112+
113+
## Errors
114+
115+
> There are (at least) two parts to errors: representing the source program that causes the error, and generating output
116+
> that explains it. The term “error situation” refers to a representation of the cause of the error in the program
117+
> source.
118+
>
119+
> For each error situation it may be that the language:
120+
>
121+
> - isn’t expressive enough to capture it
122+
> - can at least partially express the situation
123+
> - prevents the program from being constructed
124+
>
125+
> Expressiveness, in turn, can be for multiple artifacts:
126+
>
127+
> - the buggy versions of the programs
128+
> - the correct variants of the programs
129+
> - the type system’s representation of the constraints
130+
> - the type system’s reporting of the violation
131+
132+
> Q. Which error situations are known to be inexpressible? Why?
133+
Many of the programs require explicit parametric polymorphism and the higher-order function versions of the TableAPI
134+
operations to get the best feedback.
135+
136+
* `getOnlyRow` provides no feedback on the error as we do not currently track table size information statically
137+
138+
139+
> Q. Which error situations are only partially expressible? Why, and what’s missing?
140+
* Two versions of `brownJellybeans` are implemented with tradeoffs on expressibility:
141+
* The first version takes a string column name and uses our more dynamic operations to select the column.
142+
This provides no feedback on the error but more closely matches the implementation in the benchmark.
143+
* The second version takes a function that selects the column and uses our more type-safe operations to select the
144+
column. This correctly localizes the error to the column selection.
145+
146+
> Q. Which error situations’ expressibility is unknown? Why?
147+
None
148+
149+
> Q. Which error situations can be expressed more precisely than in the benchmark? How?
150+
None
151+
152+
> Q. Which error situations are prevented from being constructed? How?
153+
None
154+
155+
> Q. For each error situation that is at least partially expressible, what is the quality of feedback to the programmer?
156+
* Malformed Tables
157+
* For missing schemas, rows, and cells they are represented by syntactic holes in the program. These are easily
158+
visible in the editor and can be filled in by the programmer.
159+
* For tables where the schema is the incorrect length static errors are added onto each row showing the type
160+
inconsistency between the schema type and the row type.
161+
* If extraneous columns are present, the error is localized to the column label and an error is placed
162+
* e.g. `favorite color is not part of expected labels: name, age`.
163+
* If there is a cell of the wrong type, the error is localized to the cell and an inconsistent type error is placed
164+
* e.g. `String inconsistent with expected type Int for label age`
165+
166+
Note that in the following programs the errors are partially localized based off of the chosen explicit type
167+
application. Using different type-hole inference or choices for parametric type application would change the error
168+
localization and message.
169+
170+
* `midFinal`
171+
* Localizes the error to the column selection `mid` in the editor.
172+
* Message: `Label mid not found in tuple's labels: name age quiz1 quiz2 midterm quiz3 quiz4 final`
173+
* `blackAndWhite`
174+
* Localizes the error to the column selection `black and white` in the editor.
175+
* Message:
176+
```Label `black and white` not found in tuple's labels: get_acne red black white green yellow brown orange pink purple```
177+
* `pieCount`
178+
* Localizes the error to the column selection `true` and `get_count` in the editor.`
179+
* The error messages are similar to above
180+
* `brownAndGetAcne`
181+
* Localizes the error to the column selection `brown and get acne` in the editor.
182+
* The error messages are similar to above
183+
* `favoriteColor`
184+
* Localizes the error to the column selection `favorite color` in the editor.
185+
* The error message: `String is inconsistent with expected type Bool`
186+
* `brownJellybeans`
187+
* The first version provides no feedback on the error as it uses the string column name.
188+
* The second version localizes the error to the column selection, `color` with an error message similar to above.
189+
* `employee_to_department`
190+
* Localizes an error to the column selection `last_name` in the editor
191+
* Localizes another error to the tuple extension saying the resulting row's type is inconsistent since `last_name` is
192+
a `Int` but the expected type is `String`
193+
* The error message: `Label department not found in tuple's labels: name age department salary`
194+
195+
196+
> Q. For each error situation that is prevented from being constructed, what is the quality of feedback to the programmer?
197+
N/A

src/b2t2/Datasheet.re

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
open Haz3lcore;
2+
open Language;
3+
let content = [%blob "Datasheet.md"];
4+
5+
let content: string = content |> Util.StringUtil.escape_linebreaks;
6+
let string_exp = IdTagged.FreshGrammar.Exp.string(content);
7+
let segment =
8+
ProjectorInit.init(
9+
TextArea,
10+
Segment.parenthesize(
11+
ExpToSegment.exp_to_segment(
12+
~settings=ExpToSegment.Settings.editable(~inline=true),
13+
string_exp,
14+
),
15+
),
16+
Exp(string_exp),
17+
)
18+
|> Option.get;
19+
let slide = ("B2T2 / Datasheet", PersistentSegment.persist([segment]));

src/b2t2/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# B2T2 Implementation in Hazel
2+
3+
This directory contains Hazel's implementation of the B2T2 (Brown Benchmark for Table Types) benchmark.
4+
5+
## What is B2T2?
6+
7+
B2T2 is a language design benchmark for evaluating type systems for table programming. It provides a standardized framework to compare the expressive power and diagnostic quality of different programming languages and systems when handling tabular data operations.
8+
9+
The benchmark was created by researchers at Brown University and is documented in the paper:
10+
11+
**"Types for Tables: A Language Design Benchmark"**
12+
Authors: Kuang-Chen Lu, Ben Greenman, Shriram Krishnamurthi
13+
Published in: The Art, Science, and Engineering of Programming, 2022
14+
15+
- **Paper**: https://cs.brown.edu/~sk/Publications/Papers/Published/lgk-b2t2/
16+
- **Repository**: https://github.com/brownplt/B2T2
17+
18+
## What is this Directory?
19+
20+
This directory contains Hazel's implementation and evaluation of the B2T2 This implementation demonstrates how well Hazel's type system handles table programming constructs.
21+
22+
The implementation includes:
23+
- **Datasheet** (`Datasheet.md`): A comprehensive evaluation of how Hazel addresses each component of the B2T2 benchmark
24+
- **Implementation** (`Datasheet.re`): Code used to turn the markdown datasheet into a documentation slide in the editor
25+
- **Documentation Slides** (`slides/`): Interactive examples demonstrating B2T2 concepts in Hazel
26+
- **Slides Module** (`Slides.re`): Aggregates all B2T2 slides for integration into Hazel's documentation system
27+
28+
## B2T2 Benchmark Components
29+
30+
The B2T2 benchmark consists of several key components that implementations must address:
31+
32+
1. **Table Definition**: Specification of what constitutes a table in the language
33+
2. **Example Tables**: Various table structures that must be expressible
34+
3. **Table API**: A standard library of table operations (filtering, joining, grouping, etc.)
35+
4. **Example Programs**: Real-world programs that manipulate tables
36+
5. **Error Scenarios**: Common programming errors and how well the type system catches them
37+
6. **Datasheet**: Structured evaluation of the implementation's capabilities
38+
39+
## Documentation Slides
40+
41+
The slides are organized in `Slides.re` and automatically loaded into Hazel's documentation system via `src/web/init/Init.re`.

src/b2t2/Slides.re

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
let all_slides = [
2+
Datasheet.slide,
3+
B2T2ExampleTables.out,
4+
B2T2TableAPIConstructorsemptyTable.out,
5+
B2T2TableAPIConstructorsaddRows.out,
6+
B2T2TableAPIConstructorsaddColumn.out,
7+
B2T2TableAPIConstructorsbuildColumn.out,
8+
B2T2TableAPIConstructorsvcat.out,
9+
B2T2TableAPIConstructorshcat.out,
10+
B2T2TableAPIConstructorsvalues.out,
11+
B2T2TableAPIConstructorscrossJoin.out,
12+
B2T2TableAPIConstructorsleftJoin.out,
13+
B2T2TableAPIProperties.out,
14+
B2T2TableAPIAccessSubcomponents.out,
15+
B2T2TableAPISubtable.out,
16+
B2T2TableAPIOrdering.out,
17+
B2T2TableAPIAggregate.out,
18+
B2T2TableAPIMissingValues.out,
19+
B2T2TableAPIDataCleaning.out,
20+
B2T2TableAPIUtilitiesFlatten.out,
21+
B2T2TableAPIUtilitiestransformColumn.out,
22+
B2T2TableAPIUtilitiesrenameColumns.out,
23+
B2T2TableAPIUtilitiesfind.out,
24+
B2T2TableAPIUtilitiesgroupByRetentive.out,
25+
B2T2TableAPIUtilitiesgroupBySubtractive.out,
26+
B2T2TableAPIUtilitiesupdate.out,
27+
B2T2TableAPIUtilitiesselect.out,
28+
B2T2TableAPIUtilitiesselectMany.out,
29+
B2T2TableAPIUtilitiesgroupJoin.out,
30+
B2T2TableAPIUtilitiesjoin.out,
31+
B2T2ExampleProgramsDotProduct.out,
32+
B2T2ExampleProgramspHackingHomogeneous.out,
33+
B2T2ExampleProgramspHackingHeterogeneous.out,
34+
B2T2ExampleProgramsquizScoreFilter.out,
35+
B2T2ExampleProgramsquizScoreSelect.out,
36+
B2T2ExampleProgramsgroupByRetentive.out,
37+
B2T2ExampleProgramsgroupBySubtractive.out,
38+
B2T2ErrorsMalformedTables.out,
39+
B2T2ErrorsUsingTablesPart1.out,
40+
B2T2ErrorsUsingTablesPart2.out,
41+
B2T2ErrorsUsingTablesPart3.out,
42+
];

src/b2t2/dune

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
(include_subdirs unqualified)
2+
3+
(library
4+
(name b2t2)
5+
(libraries haz3lcore)
6+
(js_of_ocaml)
7+
(instrumentation
8+
(backend bisect_ppx))
9+
(preprocess
10+
(pps
11+
ppx_yojson_conv
12+
js_of_ocaml-ppx
13+
ppx_let
14+
ppx_blob
15+
ppx_sexp_conv
16+
ppx_enumerate
17+
ppx_deriving.show
18+
ppx_deriving.eq))
19+
(preprocessor_deps
20+
(file Datasheet.md)))
21+
22+
(env
23+
(dev
24+
(js_of_ocaml
25+
(flags :standard --debuginfo --noinline --dynlink --linkall --sourcemap)))
26+
(release
27+
(js_of_ocaml
28+
(flags :standard))))

0 commit comments

Comments
 (0)