Commit 2ff9b6f
committed
fix(v2 output_assembly): final-pass schema:url self-loop sweep
Empirical batch 4 (post-b3101f8) showed Bug P self-loops STILL
materialising on non-EPFL Persons across 4 separate runs (pyffs 1/1,
detect-libc 3/8, gimie 3/10, pandoc 14/31). The pattern: zero
self-loops on EPFL-affiliated repos, all of them on github-only
contributors.
Root cause: the LLM person agent emits `payload["id"] = "<urn-or-name>"`
and `payload["schema:url"] = "https://github.com/<login>"` separately;
my agent-level guard from 97285c1 ran AT THE AGENT, where those two
fields didn't match yet. Downstream canonicalisation then resolved
`id` to the github URL — and the self-loop materialised AFTER all the
agent-level guards had already run.
Fix at `output_assembly`, which is the first stage where every entity
is in its final canonical form (post-id-resolution, pre-jsonld-build).
`_drop_person_url_self_loops()` walks Person entities, unwraps both
the bare-string and `{"@id": ...}` shapes of `schema:url`, and pops
the field when the value equals the resolved `id`. Applied to both
the root entity and `related_entities`. Mutates in place — the
upstream payloads have already been deep-copied at this point.
Companion: regenerated `tests/v2/golden/extract/*.json` because the
octocat user/Person fixture had a self-loop the sweep now drops.1 parent b3101f8 commit 2ff9b6f
3 files changed
Lines changed: 47 additions & 6 deletions
File tree
- src/v2/pipeline/stages
- tests/v2/golden/extract
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
233 | 247 | | |
234 | 248 | | |
235 | 249 | | |
| |||
256 | 270 | | |
257 | 271 | | |
258 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
259 | 305 | | |
260 | 306 | | |
261 | 307 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | 117 | | |
121 | 118 | | |
122 | 119 | | |
| |||
178 | 175 | | |
179 | 176 | | |
180 | 177 | | |
181 | | - | |
| 178 | + | |
182 | 179 | | |
183 | 180 | | |
184 | 181 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
18 | 17 | | |
19 | 18 | | |
20 | 19 | | |
| |||
37 | 36 | | |
38 | 37 | | |
39 | 38 | | |
40 | | - | |
41 | 39 | | |
42 | 40 | | |
43 | 41 | | |
| |||
0 commit comments