feat: publish versioned schemas for JSONL audit/telemetry artifacts by Copilot · Pull Request #2380 · github/gh-aw-firewall

Copilot · 2026-05-01T21:58:41Z

AWF emits token-usage.jsonl and audit.jsonl at runtime with no published contract, making consumers fragile to silent field changes and preventing external validation.

Schema files

Added schemas/ with JSON Schema (draft 2020-12) definitions:

File	JSONL artifact	Writer
`token-usage.v1.schema.json`	`token-usage.jsonl`	`containers/api-proxy/token-tracker.js`
`audit.v1.schema.json`	`audit.jsonl`	Squid (`src/squid-config.ts`)

schemas/README.md documents the versioning policy: additive changes stay in v1; breaking changes (removal, rename, type change, new required field) create a new major version file.

`_schema` field embedded in every record

Every emitted record now carries its schema identifier so consumers can branch on version:

{"_schema":"token-usage/v1","timestamp":"2025-01-01T00:00:00.000Z","provider":"anthropic",...}
{"_schema":"audit/v1","ts":1774290908.910,"client":"172.30.0.20","decision":"TCP_TUNNEL",...}

token-tracker.js — _schema: 'token-usage/v1' added to both HTTP and WebSocket record-building paths
squid-config.ts — "_schema":"audit/v1" prepended to the audit_jsonl Squid logformat string

Schema validation before write (token-tracker)

Added validateTokenUsageRecord() — a zero-external-dependency validator that checks all 13 required fields and their types before writing. Invalid records are dropped with a token_record_schema_violation warning log rather than crashing the proxy.

Tests

validateTokenUsageRecord: unit tests for all valid/invalid paths
squid-config.test.ts: asserts audit_jsonl logformat includes _schema and all required fields
log-parser.test.ts: parseAuditJsonlLine correctly handles records that include _schema
Integration: trackTokenUsage and trackWebSocketTokenUsage reach writeTokenUsage (implying records pass validation, which rejects records missing _schema)

- Add schemas/token-usage.v1.schema.json and schemas/audit.v1.schema.json - Add schemas/README.md documenting schema versioning policy - Embed _schema field in token-usage.jsonl records (token-tracker.js) - Add lightweight validateTokenUsageRecord() validation before writing - Embed _schema:"audit/v1" in Squid audit_jsonl logformat - Update sample audit.jsonl to include _schema field - Add validateTokenUsageRecord tests and schema-field integration tests - Add squid-config tests for audit_jsonl schema fields - Add log-parser test for parsing records with _schema field Closes #<issue>" Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/5a50cced-80d7-463b-9c96-c0662581a136

…eview

github-actions · 2026-05-01T22:22:15Z

✅ Coverage Check Passed

Overall Coverage

Metric	Base	PR	Delta
Lines	85.90%	85.98%	📈 +0.08%
Statements	85.88%	85.96%	📈 +0.08%
Functions	88.00%	88.00%	➡️ +0.00%
Branches	80.06%	80.10%	📈 +0.04%

📁 Per-file Coverage Changes (1 files)

File	Lines (Before → After)	Statements (Before → After)
`src/docker-manager.ts`	87.4% → 87.7% (+0.29%)	87.0% → 87.3% (+0.27%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR introduces versioned, published schema contracts for AWF’s runtime JSONL artifacts (token-usage.jsonl and audit.jsonl) and tags each emitted record with a _schema identifier to support robust downstream parsing/validation.

Changes:

Added draft 2020-12 JSON Schemas under schemas/ and documented versioning policy.
Embedded _schema into both Squid audit.jsonl records and api-proxy token-usage.jsonl records.
Added a lightweight runtime validator for token-usage records plus accompanying tests and parser updates.

Show a summary per file

File	Description
src/squid-config.ts	Prepends `_schema:"audit/v1"` into the Squid `audit_jsonl` logformat record template.
src/squid-config.test.ts	Adds assertions ensuring `audit_jsonl` exists and includes `_schema` + required fields.
src/logs/log-parser.test.ts	Ensures `parseAuditJsonlLine` tolerates records containing `_schema`.
schemas/token-usage.v1.schema.json	Defines the `token-usage/v1` record contract for `token-usage.jsonl`.
schemas/audit.v1.schema.json	Defines the `audit/v1` record contract for `audit.jsonl`.
schemas/README.md	Documents schema purpose, files, versioning policy, and validation example.
samples/audit/audit.jsonl	Updates sample audit lines to include `_schema:"audit/v1"`.
containers/api-proxy/token-tracker.js	Adds `_schema` to records, and validates token-usage records before writing.
containers/api-proxy/token-tracker.test.js	Adds validator unit tests and attempts to add integration coverage for `_schema` behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

containers/api-proxy/token-tracker.test.js:1218

Same issue as the HTTP-path test above: metrics.increment is called before writeTokenUsage, so it doesn't prove _schema was present or that the record was written/validated. Prefer spying on writeTokenUsage and asserting the captured record includes the _schema field.

  test('trackWebSocketTokenUsage path: finalizeTracking includes _schema in the record it passes to writeTokenUsage', (done) => {
    const socket = new EventEmitter();

    function buildFrame(text) {
      const payload = Buffer.from(text, 'utf8');
      const header = Buffer.alloc(2);
      header[0] = 0x81;
      header[1] = payload.length;
      return Buffer.concat([header, payload]);
    }

    const httpHeader = Buffer.from('HTTP/1.1 101 Switching Protocols\r\nUpgrade: websocket\r\n\r\n');
    const frame1 = buildFrame(JSON.stringify({
      type: 'message_start',
      message: { model: 'claude-sonnet-4-20250514', usage: { input_tokens: 20, output_tokens: 0 } },
    }));
    const frame2 = buildFrame(JSON.stringify({
      type: 'message_delta',
      usage: { output_tokens: 8 },
    }));

    const metricsRef = { increment: jest.fn() };

    trackWebSocketTokenUsage(socket, {
      requestId: 'schema-field-ws',
      provider: 'anthropic',
      path: '/v1/messages',
      startTime: Date.now(),
      metrics: metricsRef,
    });

    socket.emit('data', Buffer.concat([httpHeader, frame1, frame2]));
    socket.emit('close');

    setTimeout(() => {
      // Same indirect proof as the HTTP test above.
      expect(metricsRef.increment).toHaveBeenCalled();
      done();
    }, 20);

Files reviewed: 9/9 changed files
Comments generated: 3

+    proxyRes.statusCode = 200;
+
+    const metricsRef = { increment: jest.fn() };
+
+    trackTokenUsage(proxyRes, {
+      requestId: 'schema-field-http',
+      provider: 'openai',
+      path: '/v1/chat/completions',
+      startTime: Date.now(),
+      metrics: metricsRef,
+    });
+
+    proxyRes.emit('data', Buffer.from(JSON.stringify({
+      model: 'gpt-4o',
+      usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
+    })));
+    proxyRes.emit('end');
+
+    setTimeout(() => {
+      // metrics.increment was called, which means the record passed validation
+      // (validateTokenUsageRecord rejects records without _schema), so _schema was present.
+      expect(metricsRef.increment).toHaveBeenCalled();
+      done();
+    }, 20);
+  });
+
+  test('trackWebSocketTokenUsage path: finalizeTracking includes _schema in the record it passes to writeTokenUsage', (done) => {
+    const socket = new EventEmitter();
+
+    function buildFrame(text) {
+      const payload = Buffer.from(text, 'utf8');
+      const header = Buffer.alloc(2);


+function validateTokenUsageRecord(record) {
+  const required = [
+    ['_schema', 'string'],
+    ['timestamp', 'string'],
+    ['request_id', 'string'],
+    ['provider', 'string'],
+    ['model', 'string'],
+    ['path', 'string'],
+    ['status', 'number'],
+    ['streaming', 'boolean'],
+    ['input_tokens', 'number'],
+    ['output_tokens', 'number'],
+    ['cache_read_tokens', 'number'],
+    ['cache_write_tokens', 'number'],
+    ['duration_ms', 'number'],
+  ];
+
+  for (const [field, expectedType] of required) {
+    // eslint-disable-next-line valid-typeof
+    if (typeof record[field] !== expectedType) {
+      logRequest('warn', 'token_record_schema_violation', {


+  test('writeTokenUsage writes _schema:"token-usage/v1" to JSONL when stream is writable', (done) => {
+    // Since TOKEN_LOG_FILE is computed at module load time (may not be writable
+    // in test env), verify that a valid record (including _schema) is accepted.
+    const record = {
+      _schema: 'token-usage/v1',
+      timestamp: new Date().toISOString(),
+      request_id: 'sentinel-schema-http',
+      provider: 'openai',
+      model: 'gpt-4o',
+      path: '/v1/chat/completions',
+      status: 200,
+      streaming: false,
+      input_tokens: 1,
+      output_tokens: 1,
+      cache_read_tokens: 0,
+      cache_write_tokens: 0,
+      duration_ms: 10,
+      response_bytes: 42,
+    };
+
+    // We have no direct access to the internal logStream singleton.
+    // Instead, verify that writeTokenUsage validates and does NOT throw
+    // when given a valid record (which requires _schema to be correct).
+    // The absence of a thrown error + validateTokenUsageRecord returning true
+    // is the integration proof that _schema is accepted.
+    expect(() => writeTokenUsage(record)).not.toThrow();
+    done();
+  });
+
+  test('trackTokenUsage HTTP path: finalizeTracking includes _schema in the record it passes to writeTokenUsage', (done) => {
+    // We verify via validateTokenUsageRecord (exported) that the record produced
+    // by finalizeTracking would pass schema validation.  The combination of:
+    //   1. validateTokenUsageRecord rejects records without _schema (tested above)
+    //   2. trackTokenUsage calls writeTokenUsage which calls validateTokenUsageRecord
+    //   3. metrics.increment IS called (confirming writeTokenUsage was reached)
+    // proves that the record contains _schema.


lpcox · 2026-05-01T22:27:25Z

@copilot address the review feedback

- Add null/non-object guard to validateTokenUsageRecord to prevent TypeError on writeTokenUsage(null/undefined) - Rewrite 'token-usage JSONL record schema field' tests to actually capture the bytes written to the JSONL stream via jest.spyOn(fs, 'createWriteStream'), instead of the weaker not.toThrow() / metrics.increment assertions - Add null/undefined/primitive rejection tests for validateTokenUsageRecord Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/4a05b5f0-4b15-4797-96da-617484b64562

Copilot · 2026-05-01T22:32:54Z

@copilot address the review feedback

Addressed all three review items in commit 1bf73db:

Null guard (token-tracker.js): Added if (!record || typeof record !== 'object') check at the top of validateTokenUsageRecord so writeTokenUsage(null/undefined) returns false instead of throwing.
Stream capture tests: Replaced the weak not.toThrow() assertion and the incorrect metrics-based assertions with proper stream mocking. beforeEach now uses jest.spyOn(fs, 'createWriteStream') to inject an in-memory writable mock, then each test asserts the actual JSONL bytes contain _schema: 'token-usage/v1' and the expected request_id.
Additional null/primitive rejection tests: Added rejects null without throwing, rejects undefined without throwing, and rejects a non-object primitive without throwing tests for the new guard.

github-actions · 2026-05-01T22:39:30Z

Smoke Test: Copilot BYOK (Offline) Mode

Test	Result
1. GitHub MCP (list PRs)	✅ PR #2375 "feat: publish versioned JSON Schema for AWF config file"
2. GitHub.com HTTP connectivity	⚠️ Template vars not expanded — pre-step data unavailable
3. File write/read	❌ File not found at expected path
4. BYOK inference (this response)	✅

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com.

Author: @Copilot · Assignees: @lpcox, @Copilot

Overall: PARTIAL (tests 2 & 3 inconclusive due to unexpanded template variables in workflow step)

🔑 BYOK report filed by Smoke Copilot BYOK

github-actions · 2026-05-01T22:39:31Z

🧪 Smoke Test Results

Test	Status
GitHub MCP	✅ PR #2380: "feat: publish versioned schemas for JSONL audit/telemetry artifacts"
GitHub.com HTTP	✅ HTTP 200
File Write/Read	⚠️ Pre-step outputs not expanded (template vars unresolved)

Overall: PASS (core connectivity confirmed)

Author: @Copilot · Assignees: @lpcox, @Copilot

📰 BREAKING: Report filed by Smoke Copilot

github-actions · 2026-05-01T22:39:46Z

Smoke Test Results:

✅ GitHub MCP: Reviewed 2 merged PRs
✅ Playwright: github.com title verified
✅ File Writing: Test file created
✅ Bash Verification: File contents confirmed

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

github-actions · 2026-05-01T22:40:12Z

Chroot Smoke Test Results

Runtime	Host Version	Chroot Version	Match?
Python	`Python 3.12.13`	`Python 3.12.3`	❌
Node.js	`v24.14.1`	`v20.20.2`	❌
Go	`go1.22.12`	`go1.22.12`	✅

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environment.

Tested by Smoke Chroot

github-actions · 2026-05-01T22:40:20Z

feat: publish versioned JSON Schema for AWF config file
feat(api-proxy): Anthropic prompt-cache injection, TTL upgrade, tool drop, ANSI strip, custom transform hook
GitHub PR review ✅
GH CLI query ✅
Playwright ✅
Tavily ❌
File write/read ✅
Build ✅
Overall: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

github-actions · 2026-05-01T22:41:33Z

🏗️ Build Test Suite Results

Ecosystem	Project	Build/Install	Tests	Status
Bun	elysia	✅	1/1 passed	✅ PASS
Bun	hono	✅	1/1 passed	✅ PASS
C++	fmt	✅	N/A	✅ PASS
C++	json	✅	N/A	✅ PASS
Deno	oak	N/A	1/1 passed	✅ PASS
Deno	std	N/A	1/1 passed	✅ PASS
.NET	hello-world	✅	N/A	✅ PASS
.NET	json-parse	✅	N/A	✅ PASS
Go	color	✅	passed	✅ PASS
Go	env	✅	passed	✅ PASS
Go	uuid	✅	passed	✅ PASS
Java	gson	✅	1/1 passed	✅ PASS
Java	caffeine	✅	1/1 passed	✅ PASS
Node.js	clsx	✅	passed	✅ PASS
Node.js	execa	✅	passed	✅ PASS
Node.js	p-limit	✅	passed	✅ PASS
Rust	fd	✅	1/1 passed	✅ PASS
Rust	zoxide	✅	1/1 passed	✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #2380 · ● 629.9K · ◷

github-actions · 2026-05-01T22:41:41Z

Smoke Test: Services Connectivity

Check	Result
Redis `PING`	❌ Timeout — no response
PostgreSQL `pg_isready`	❌ No response on port 5432
PostgreSQL `SELECT 1`	❌ Skipped (host unreachable)

Overall: FAIL

host.docker.internal is not reachable from this runner environment. Service containers may not be running or the hostname is not resolvable.

🔌 Service connectivity validated by Smoke Services

Initial plan

817cefd

Copilot AI assigned Copilot and lpcox May 1, 2026

Copilot started work on behalf of lpcox May 1, 2026 21:58 View session

Copilot AI linked an issue May 1, 2026 that may be closed by this pull request

feat: publish versioned schemas for JSONL audit/telemetry artifacts #2378

Closed

Copilot AI added 2 commits May 1, 2026 22:09

refactor: remove redundant requires in token-tracker tests per code r…

e2a4924

…eview

Copilot AI changed the title ~~[WIP] Add versioned schemas for JSONL audit and telemetry artifacts~~ feat: publish versioned schemas for JSONL audit/telemetry artifacts May 1, 2026

Copilot finished work on behalf of lpcox May 1, 2026 22:14

Copilot AI requested a review from lpcox May 1, 2026 22:14

lpcox marked this pull request as ready for review May 1, 2026 22:21

lpcox requested a review from Mossaka as a code owner May 1, 2026 22:21

Copilot AI review requested due to automatic review settings May 1, 2026 22:21

Copilot started reviewing on behalf of lpcox May 1, 2026 22:21 View session

github-advanced-security AI found potential problems May 1, 2026

View reviewed changes

Comment thread containers/api-proxy/token-tracker.test.js Fixed

Potential fix for pull request finding 'CodeQL / Unused variable, imp…

176f83e

…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot AI reviewed May 1, 2026

View reviewed changes

Copilot started work on behalf of lpcox May 1, 2026 22:27 View session

github-actions Bot added the smoke-copilot-byok label May 1, 2026

This comment has been minimized.

Sign in to view

github-actions Bot added the smoke-claude label May 1, 2026

This comment has been minimized.

Sign in to view

github-actions Bot mentioned this pull request May 1, 2026

[aw] No-Op Runs #2301

Open

This comment has been minimized.

Sign in to view

github-actions Bot added the build-test label May 1, 2026

This comment has been minimized.

Sign in to view

Copilot finished work on behalf of lpcox May 1, 2026 22:33

github-actions Bot added the smoke-copilot label May 1, 2026

lpcox merged commit 0413d68 into main May 1, 2026
65 of 69 checks passed

lpcox deleted the copilot/feat-publish-versioned-schemas branch May 1, 2026 22:58

github-actions Bot mentioned this pull request May 2, 2026

feat: validate config files against published JSON Schema at runtime #2384

Merged

Conversation

Copilot AI commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Schema files

_schema field embedded in every record

Schema validation before write (token-tracker)

Tests

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Coverage Check Passed

Overall Coverage

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

lpcox commented May 1, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copilot AI commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Smoke Test: Copilot BYOK (Offline) Mode

Uh oh!

github-actions Bot commented May 1, 2026

🧪 Smoke Test Results

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Chroot Smoke Test Results

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

🏗️ Build Test Suite Results

Uh oh!

github-actions Bot commented May 1, 2026

Smoke Test: Services Connectivity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 1, 2026 •

edited

Loading

`_schema` field embedded in every record

github-actions Bot commented May 1, 2026 •

edited

Loading