Commit d2992e9
authored
feat(pipeline): harvest related legislation discovered during enrichment (#901)
* feat(pipeline): harvest related legislation discovered during enrichment
Regelingen reached only through the machine-readable model (source.regulation,
legal_basis, and open_terms/implements delegations like regeling_standaardpremie)
are never auto-harvested: the recursive harvester follows only <extref> BWB
hyperlinks in the source text, and those model links are a product of enrichment.
The enrichment agent now returns the related legislation it needs in a result
envelope (.enrichment-result.yaml) — deliberately kept OUT of the law YAML so the
law artifact stays schema-conformant. After a successful enrich the worker reads
the envelope, resolves each entry to a BWB id (agent bwb_id -> law_entries slug ->
single-hit SRU title search), and enqueues a follow-up harvest for each. Newly
harvested laws auto-enrich and return their own related legislation, so the
dependency graph fills itself by recursion.
- EnrichPayload.depth carries recursion depth (harvest -> enrich -> harvest);
each related harvest is depth+1 at priority 40-(depth+1), so deeper nesting
yields to shallower/interactive harvests.
- Opt-in: HARVEST_RELATED_LEGISLATION (off by default) + RELATED_HARVEST_MAX_DEPTH
(default 2); reuses ENRICH_DAILY_LIMIT for spend and create_harvest_job_if_not_exists
for dedup. Best-effort: nothing here can fail the already-committed enrichment.
- search_bwb_by_name extracted from the axum handler for reuse; find_bwb_id_by_slug
made pub; slug hits re-validated as BWB (CVDR skipped). RFC-025 documents the
pattern and its known limitations.
* refactor(pipeline): make related-legislation harvest always-on
Drop the HARVEST_RELATED_LEGISLATION opt-in flag. The follow-up harvest only
enqueues harvest jobs (no LLM cost); the expensive re-enrichment of those laws
stays gated by ENRICH_AUTO_ENQUEUE + ENRICH_DAILY_LIMIT, and the recursion is
bounded by RELATED_HARVEST_MAX_DEPTH. So there is nothing to protect behind a
separate flag.
* fix(dev): exclude enrichment sidecars from law YAML validation
The enrichment result envelope (.enrichment-result.yaml) and the existing
.enrichment.yaml metadata sidecar are written into a law directory but are not
law files. `find -name '*.yaml'` matches leading-dot names and the pre-commit
`files:` regex matches them too, so validating one fails (missing $id). Skip
dot-prefixed sidecars in both script/validate.sh and the validate-law-yaml hook.
* fix(pipeline): tighten related-legislation resolution; drop RFC-025
Address CI review findings:
- A CVDR slug hit no longer falls through to the SRU name search (the slug
already identified the law; a title match could resolve a *different* national
law). It now returns Unresolved.
- The harvest summary log separates already_queued and exhausted skips instead of
conflating them in the resolved-but-not-enqueued gap.
Also drop RFC-025: the related-legislation harvest loop is an implementation
detail, not a cross-cutting design decision that warrants an RFC.
* fix(pipeline): address review nits on related-legislation resolution
- Validate the single-hit SRU result as a BWB id before resolving, so a
malformed SRU id can't slip into a harvest payload (paths a/b already validate).
- Read the .enrichment-result.yaml sidecar via tokio::fs (was blocking std::fs)
for consistency with the rest of execute_enrich_with_runner.
- Clarify the depth-inherit comment: the field is the shared extref-recursion
counter, so deep-via-extref laws skip related discovery (roots/shallow laws,
the intended case, are unaffected).1 parent 454b25c commit d2992e9
10 files changed
Lines changed: 577 additions & 35 deletions
File tree
- .claude/skills/law-generate
- packages
- admin/src
- pipeline/src
- api
- script
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
763 | 763 | | |
764 | 764 | | |
765 | 765 | | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
766 | 804 | | |
767 | 805 | | |
768 | 806 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
62 | 65 | | |
63 | 66 | | |
64 | 67 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
644 | 644 | | |
645 | 645 | | |
646 | 646 | | |
| 647 | + | |
| 648 | + | |
647 | 649 | | |
648 | 650 | | |
649 | 651 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
33 | | - | |
34 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
35 | 50 | | |
36 | 51 | | |
37 | 52 | | |
| |||
50 | 65 | | |
51 | 66 | | |
52 | 67 | | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
62 | 71 | | |
63 | 72 | | |
64 | 73 | | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
100 | 105 | | |
101 | 106 | | |
102 | 107 | | |
| |||
118 | 123 | | |
119 | 124 | | |
120 | 125 | | |
| 126 | + | |
121 | 127 | | |
122 | 128 | | |
123 | 129 | | |
| |||
193 | 199 | | |
194 | 200 | | |
195 | 201 | | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
196 | 207 | | |
197 | 208 | | |
198 | 209 | | |
| |||
207 | 218 | | |
208 | 219 | | |
209 | 220 | | |
| 221 | + | |
210 | 222 | | |
211 | 223 | | |
212 | 224 | | |
| |||
0 commit comments