fix(etl): flatten org aggregate report before ingest#8
Open
ndriyo wants to merge 2 commits into
Open
Conversation
The org aggregate path in fetchOrgData() cast each NDJSON line directly to CopilotAggregateRecord, skipping the day_totals envelope GitHub returns. That left record.day undefined, so the insert into fact_org_aggregate_daily emitted DEFAULT for the day column and failed the NOT NULL constraint: null value in column "day" of relation "fact_org_aggregate_daily" violates not-null constraint Route the NDJSON lines through flattenAggregateReport() exactly like fetchEnterpriseAggregate() already does. The helper expands day_totals[] into one record per day and tags each with _orgLogin / _scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes org-scoped aggregate report ingestion by correctly flattening the Copilot Usage Metrics API’s org aggregate NDJSON “day_totals[]” envelope into per-day CopilotAggregateRecords before ETL ingest, aligning behavior with the already-correct enterprise aggregate path.
Changes:
- Parse org aggregate NDJSON as
AggregateReportLine(envelope form) instead ofCopilotAggregateRecord. - Flatten org aggregate lines via
flattenAggregateReport(..., "organization", orgLogin)to produce per-day records with_scope/_orgLogintagging.
Comment on lines
717
to
722
| const fileResponse = await fetch(link); | ||
| apiRequestCount++; | ||
| if (fileResponse.ok) { | ||
| const content = await fileResponse.text(); | ||
| const parsed = parseNdjson<CopilotAggregateRecord>(content); | ||
| for (const r of parsed) { | ||
| r._orgLogin = opts.orgLogin; | ||
| r._scope = "organization"; | ||
| } | ||
| aggregateRecords.push(...parsed); | ||
| lines.push(...parseNdjson<AggregateReportLine>(content)); | ||
| } |
Author
There was a problem hiding this comment.
Good catch. Mirrored the enterprise pattern — now logs a console.warn with the failed download's status/statusText (including the org login for context) and continues to the next link instead of silently skipping.
…skipping Mirrors the enterprise aggregate path so partial ingests surface a diagnostic signal. Addresses PR amgdy#8 review feedback.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fetchOrgData()was parsing the org-aggregate NDJSON directly asCopilotAggregateRecord, skipping theday_totals[]envelope that GitHub's Copilot Usage Metrics API actually returns for organization-scoped aggregate reports.The result was that
record.daywasundefinedfor every org-aggregate line. Drizzle then emittedDEFAULTfor thedaycolumn in thefact_org_aggregate_dailyinsert, which violates the column'sNOT NULLconstraint and fails the entire ingest with:The enterprise path (
fetchEnterpriseAggregate()) already handles this correctly by routing NDJSON throughflattenAggregateReport()before merging. This PR aligns the org path with the same helper._orgLogin/_scopetagging that the old loop did manually is performed insideflattenAggregateReport, so it is preserved.Test plan
fact_org_aggregate_dailypopulates with one row per(day, org, scope)and the PR-metrics dashboards (/pull-requests,/metrics) light up.app/src/lib/github/__tests__/aggregate-teams.test.tsshould still pass — the org-aggregate flow now uses the same flatten helper the test suite already exercises for the enterprise path.🤖 Generated with Claude Code