Skip to content

Commit b3101f8

Browse files
committed
fix(v2 ownership): synthesized Person stubs no longer emit schema:url self-loop
Empirical batch 3 (`gabyx/pandoc`, 14 self-loops) showed Bug P still in the output despite the agent-side fixes shipped in 9aadfa7 + 97285c1. Root cause: `_synthesize_owner_person_stub()` in `ownership_check.py` (used by `guarantee_repo_author` to recover repos whose `schema:author` array reconciliation emptied) builds the Person with `schema:url = profile_url = "https://github.com/{handle}"` — the same value used as the stub's `id`. By construction every salvaged stub has the tautological self-loop the agent-side guards were designed to prevent; the stubs are added downstream of the agents so the agent fixes can't see them. Set `schema:url` to `None` for these stubs. The Person's `@id` already carries the github profile URL; consumers that want it read `@id` or follow `pulse:githubUsername`. SHACL allows `schema:url` to be null. Companion observation (not a separate fix here): batch 3 also showed the previously fixed Bug A membership filter is working — `vita-epfl/social-nce` correctly dropped the spurious Vita-Brazil / Vita-Germany / Vita-China / Vita-UK Memberships from a fuzzy-`query_orcid` match against the `vita-epfl` string. The Organization entities themselves still appear in the @graph as orphan stubs (the org_agent had already added them before the membership filter ran). That's the next class to address — worth doing inside the strategy framework the user proposed (tool guards / provenance stamps / critic activation) rather than another inline filter.
1 parent 97285c1 commit b3101f8

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

src/v2/pipeline/stages/ownership_check.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1067,7 +1067,13 @@ def _synthesize_owner_person_stub(handle: str) -> dict[str, Any]:
10671067
},
10681068
"idSource": "pulse:githubUsername",
10691069
"schema:name": handle,
1070-
"schema:url": profile_url,
1070+
# `schema:url` intentionally left None — for github-only synthesized
1071+
# stubs the only candidate URL would be the github profile, which
1072+
# IS the Person's `id`. Emitting it produces a tautological
1073+
# self-loop (Bug P, 1505 cases in the production audit + 14 more
1074+
# observed in `gabyx/pandoc` even after the agent-side fixes,
1075+
# because the stubs are added downstream of those agents).
1076+
"schema:url": None,
10711077
"pulse:githubUsername": handle,
10721078
"pulse:orcidIdentifier": None,
10731079
"pulse:infosciencePersonIdentifier": None,

0 commit comments

Comments
 (0)