You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Project Overview
4
4
5
-
Binoc generates changelogs for datasets that don't ship with them. Given two snapshots of a dataset, it detects structural and content changes, records them as a migration tree (the IR), and renders changes as JSON or Markdown. The primary audience is archivists, data scientists, and stewards tracking undocumented changes to published datasets.
5
+
Binoc generates changelogs for datasets that don't ship with them. Given two snapshots of a dataset, it detects structural and content changes, records them as a changeset tree (the IR), and renders changes as JSON or Markdown. The primary audience is archivists, data scientists, and stewards tracking undocumented changes to published datasets.
6
6
7
7
Rust workspace with five crates:
8
8
@@ -22,9 +22,9 @@ Shared test fixtures live in `test-vectors/`. Authoritative architecture spec is
22
22
23
23
2.**The standard library (`binoc-stdlib`) is a plugin pack**, architecturally identical to third-party packs. The core engine has zero domain knowledge—not even about directories or text files.
24
24
25
-
3.**Comparators are the parser** (raw data → IR). **Transformers are optimization passes** (IR → IR, no raw data access). **Significance classification is an outputter concern**, mapped from semantic tags via config—not baked into the IR.
25
+
3.**Comparators are the parser** (raw data → IR). **Transformers are optimization passes** (IR → IR, no raw data access). **Significance classification is a renderer concern**, mapped from semantic tags via config—not baked into the IR.
26
26
27
-
4.**The IR is tree-structured, openly typed, and tag-annotated.**`kind`, `item_type`, and `tags` are open enums/strings. No built-in types or significance levels. Conventions, not enforcement.
27
+
4.**The IR is tree-structured, openly typed, and tag-annotated.**`action`, `item_type`, and `tags` are open enums/strings. No built-in types or significance levels. Conventions, not enforcement.
28
28
29
29
5.**Dispatch is declarative-first** (type/extension filters) **with an imperative escape hatch** (`can_handle`). First comparator to claim an item wins. Ordering is a config concern, not a plugin concern.
Copy file name to clipboardExpand all lines: README.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Binoc: The Missing Changelog for Datasets
2
2
3
-
Binoc generates changelogs for datasets that don't have them. Given a series of snapshots of a dataset downloaded at different times, Binoc detects what changed, expresses those changes as a minimal structured diff, and produces human-readable summaries that distinguish substantive policy changes from ministerial housekeeping.
3
+
Binoc generates changelogs for datasets that don't have them. Given a series of snapshots of a dataset downloaded at different times, Binoc detects what changed, expresses those changes as a minimal structured diff, and produces human-readable summaries that distinguish substantive policy changes from clerical housekeeping.
4
4
5
5
The core workflow: an archivist, data scientist, or steward has five copies of a government dataset containing CSVs, downloaded over two years. Some are identical. Some have reordered columns. One has a new category relevant to their research. Binoc tells them exactly what changed, when, and whether (by their definition) it matters.
Binoc looked inside the zip and compared the CSV column-by-column — the reorder is flagged as ministerial housekeeping, not a real data change. But `.sqlite` is opaque to the standard library, so you only learn that the bytes differ.
27
+
Binoc looked inside the zip and compared the CSV column-by-column — the reorder is flagged as clerical housekeeping, not a real data change. But `.sqlite` is opaque to the standard library, so you only learn that the bytes differ.
@@ -50,7 +50,7 @@ Same command, richer output. The plugin parsed the database and found the actual
50
50
Datasets published by governments, research institutions, and public bodies are living artifacts, and can change without warning or documentation (or without consistent documentation). The archival and data science communities need tooling to:
51
51
52
52
- Detect whether a new snapshot of a dataset actually differs from the previous one.
53
-
- Describe changes precisely — not just "the file changed," but "three columns were reordered (ministerial) and one column was split into two (substantive)."
53
+
- Describe changes precisely — not just "the file changed," but "three columns were reordered (clerical) and one column was split into two (substantive)."
54
54
- Produce changelogs that are machine-readable for automated pipelines and human-readable for policy analysis.
0 commit comments