🤖 Part of #5005.
Description
Add PPL/gap eval sets over real GitHub API event payloads from GH Archive. This covers structured output as consumed by tools: nested JSON, IDs, URLs, timestamps, actor/repo metadata, issue/PR payloads, and event schemas.
Primary source:
Related structured-output sources for later comparison:
This should be separate from generic JSON evals because agent/tool workflows often consume GitHub-shaped payloads directly.
Definition of Done
- Build a small held-out GH Archive slice with date-based selection.
- Split metrics by event type, at least PushEvent, PullRequestEvent, IssuesEvent, IssueCommentEvent, and WorkflowRunEvent if present.
- Mask or bucket fields that should not dominate metrics, such as long IDs and hashes.
- Keep this out of default validation sets.
🤖 Part of #5005.
Description
Add PPL/gap eval sets over real GitHub API event payloads from GH Archive. This covers structured output as consumed by tools: nested JSON, IDs, URLs, timestamps, actor/repo metadata, issue/PR payloads, and event schemas.
Primary source:
Related structured-output sources for later comparison:
This should be separate from generic JSON evals because agent/tool workflows often consume GitHub-shaped payloads directly.
Definition of Done