You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(v2): demote GitHub-derived props to org units + RCP_TOKEN round-robin
Two independent fixes bundled per the live debugging session.
1. **`demote_github_props_to_units` stage** addressing
Imaging-Plaza/git-metadata-extractor issues #29 and #33:
When an Organization had both a ROR identifier (the legal entity)
and a GitHub presence linked via `org:hasUnit → github_url`, the
rule-based agent stamped the GitHub-derived properties
(`pulse:githubOrgFollowers`, `pulse:githubOrganizationHandle`) on
the ROR parent rather than the unit. Real-world example from the
audit: `ror.org/0070nx673` (Okino) was carrying followers=364 that
actually belong to its unit `github.com/InteractiveComputerGraphics`.
Add a post-pass stage that, only when the ROR parent's handle
matches the github-only unit's handle (so the data really is the
unit's), moves the follower count to the unit if the unit lacks it
and clears the GitHub-derived properties on the parent. The match
guard prevents stripping legitimate data when the parent is its own
independent GitHub presence with separate units.
Wired immediately after `infer_org_units` in the api.py orchestrator
so the unit relationships are stamped before this pass runs.
2. **Multi-token `RCP_TOKEN` support** mirroring the existing
`GITHUB_TOKEN` pattern. Set `RCP_TOKEN=sk-A,sk-B` and every model
instantiation pulls the next key via `itertools.cycle` under a
threading lock. Doubles the per-process rate-limit budget when the
shared inference endpoint pushes back on bursty extracts.
Independent cycles per env var name so distinct providers
(`RCP_TOKEN`, `OPENAI_API_KEY`, ...) don't share state.
Both changes covered by inline synthetic tests (issue #29 example
demote roundtrip + round-robin sequencing + single-token passthrough).
0 commit comments