feat: publish versioned schemas for JSONL audit/telemetry artifacts#2380
feat: publish versioned schemas for JSONL audit/telemetry artifacts#2380
Conversation
- Add schemas/token-usage.v1.schema.json and schemas/audit.v1.schema.json - Add schemas/README.md documenting schema versioning policy - Embed _schema field in token-usage.jsonl records (token-tracker.js) - Add lightweight validateTokenUsageRecord() validation before writing - Embed _schema:"audit/v1" in Squid audit_jsonl logformat - Update sample audit.jsonl to include _schema field - Add validateTokenUsageRecord tests and schema-field integration tests - Add squid-config tests for audit_jsonl schema fields - Add log-parser test for parsing records with _schema field Closes #<issue>" Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/5a50cced-80d7-463b-9c96-c0662581a136
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces versioned, published schema contracts for AWF’s runtime JSONL artifacts (token-usage.jsonl and audit.jsonl) and tags each emitted record with a _schema identifier to support robust downstream parsing/validation.
Changes:
- Added draft 2020-12 JSON Schemas under
schemas/and documented versioning policy. - Embedded
_schemainto both Squidaudit.jsonlrecords and api-proxytoken-usage.jsonlrecords. - Added a lightweight runtime validator for token-usage records plus accompanying tests and parser updates.
Show a summary per file
| File | Description |
|---|---|
| src/squid-config.ts | Prepends _schema:"audit/v1" into the Squid audit_jsonl logformat record template. |
| src/squid-config.test.ts | Adds assertions ensuring audit_jsonl exists and includes _schema + required fields. |
| src/logs/log-parser.test.ts | Ensures parseAuditJsonlLine tolerates records containing _schema. |
| schemas/token-usage.v1.schema.json | Defines the token-usage/v1 record contract for token-usage.jsonl. |
| schemas/audit.v1.schema.json | Defines the audit/v1 record contract for audit.jsonl. |
| schemas/README.md | Documents schema purpose, files, versioning policy, and validation example. |
| samples/audit/audit.jsonl | Updates sample audit lines to include _schema:"audit/v1". |
| containers/api-proxy/token-tracker.js | Adds _schema to records, and validates token-usage records before writing. |
| containers/api-proxy/token-tracker.test.js | Adds validator unit tests and attempts to add integration coverage for _schema behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
containers/api-proxy/token-tracker.test.js:1218
- Same issue as the HTTP-path test above:
metrics.incrementis called beforewriteTokenUsage, so it doesn't prove_schemawas present or that the record was written/validated. Prefer spying onwriteTokenUsageand asserting the captured record includes the_schemafield.
test('trackWebSocketTokenUsage path: finalizeTracking includes _schema in the record it passes to writeTokenUsage', (done) => {
const socket = new EventEmitter();
function buildFrame(text) {
const payload = Buffer.from(text, 'utf8');
const header = Buffer.alloc(2);
header[0] = 0x81;
header[1] = payload.length;
return Buffer.concat([header, payload]);
}
const httpHeader = Buffer.from('HTTP/1.1 101 Switching Protocols\r\nUpgrade: websocket\r\n\r\n');
const frame1 = buildFrame(JSON.stringify({
type: 'message_start',
message: { model: 'claude-sonnet-4-20250514', usage: { input_tokens: 20, output_tokens: 0 } },
}));
const frame2 = buildFrame(JSON.stringify({
type: 'message_delta',
usage: { output_tokens: 8 },
}));
const metricsRef = { increment: jest.fn() };
trackWebSocketTokenUsage(socket, {
requestId: 'schema-field-ws',
provider: 'anthropic',
path: '/v1/messages',
startTime: Date.now(),
metrics: metricsRef,
});
socket.emit('data', Buffer.concat([httpHeader, frame1, frame2]));
socket.emit('close');
setTimeout(() => {
// Same indirect proof as the HTTP test above.
expect(metricsRef.increment).toHaveBeenCalled();
done();
}, 20);
- Files reviewed: 9/9 changed files
- Comments generated: 3
| proxyRes.statusCode = 200; | ||
|
|
||
| const metricsRef = { increment: jest.fn() }; | ||
|
|
||
| trackTokenUsage(proxyRes, { | ||
| requestId: 'schema-field-http', | ||
| provider: 'openai', | ||
| path: '/v1/chat/completions', | ||
| startTime: Date.now(), | ||
| metrics: metricsRef, | ||
| }); | ||
|
|
||
| proxyRes.emit('data', Buffer.from(JSON.stringify({ | ||
| model: 'gpt-4o', | ||
| usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 }, | ||
| }))); | ||
| proxyRes.emit('end'); | ||
|
|
||
| setTimeout(() => { | ||
| // metrics.increment was called, which means the record passed validation | ||
| // (validateTokenUsageRecord rejects records without _schema), so _schema was present. | ||
| expect(metricsRef.increment).toHaveBeenCalled(); | ||
| done(); | ||
| }, 20); | ||
| }); | ||
|
|
||
| test('trackWebSocketTokenUsage path: finalizeTracking includes _schema in the record it passes to writeTokenUsage', (done) => { | ||
| const socket = new EventEmitter(); | ||
|
|
||
| function buildFrame(text) { | ||
| const payload = Buffer.from(text, 'utf8'); | ||
| const header = Buffer.alloc(2); |
| function validateTokenUsageRecord(record) { | ||
| const required = [ | ||
| ['_schema', 'string'], | ||
| ['timestamp', 'string'], | ||
| ['request_id', 'string'], | ||
| ['provider', 'string'], | ||
| ['model', 'string'], | ||
| ['path', 'string'], | ||
| ['status', 'number'], | ||
| ['streaming', 'boolean'], | ||
| ['input_tokens', 'number'], | ||
| ['output_tokens', 'number'], | ||
| ['cache_read_tokens', 'number'], | ||
| ['cache_write_tokens', 'number'], | ||
| ['duration_ms', 'number'], | ||
| ]; | ||
|
|
||
| for (const [field, expectedType] of required) { | ||
| // eslint-disable-next-line valid-typeof | ||
| if (typeof record[field] !== expectedType) { | ||
| logRequest('warn', 'token_record_schema_violation', { |
| test('writeTokenUsage writes _schema:"token-usage/v1" to JSONL when stream is writable', (done) => { | ||
| // Since TOKEN_LOG_FILE is computed at module load time (may not be writable | ||
| // in test env), verify that a valid record (including _schema) is accepted. | ||
| const record = { | ||
| _schema: 'token-usage/v1', | ||
| timestamp: new Date().toISOString(), | ||
| request_id: 'sentinel-schema-http', | ||
| provider: 'openai', | ||
| model: 'gpt-4o', | ||
| path: '/v1/chat/completions', | ||
| status: 200, | ||
| streaming: false, | ||
| input_tokens: 1, | ||
| output_tokens: 1, | ||
| cache_read_tokens: 0, | ||
| cache_write_tokens: 0, | ||
| duration_ms: 10, | ||
| response_bytes: 42, | ||
| }; | ||
|
|
||
| // We have no direct access to the internal logStream singleton. | ||
| // Instead, verify that writeTokenUsage validates and does NOT throw | ||
| // when given a valid record (which requires _schema to be correct). | ||
| // The absence of a thrown error + validateTokenUsageRecord returning true | ||
| // is the integration proof that _schema is accepted. | ||
| expect(() => writeTokenUsage(record)).not.toThrow(); | ||
| done(); | ||
| }); | ||
|
|
||
| test('trackTokenUsage HTTP path: finalizeTracking includes _schema in the record it passes to writeTokenUsage', (done) => { | ||
| // We verify via validateTokenUsageRecord (exported) that the record produced | ||
| // by finalizeTracking would pass schema validation. The combination of: | ||
| // 1. validateTokenUsageRecord rejects records without _schema (tested above) | ||
| // 2. trackTokenUsage calls writeTokenUsage which calls validateTokenUsageRecord | ||
| // 3. metrics.increment IS called (confirming writeTokenUsage was reached) | ||
| // proves that the record contains _schema. |
|
@copilot address the review feedback |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- Add null/non-object guard to validateTokenUsageRecord to prevent TypeError on writeTokenUsage(null/undefined) - Rewrite 'token-usage JSONL record schema field' tests to actually capture the bytes written to the JSONL stream via jest.spyOn(fs, 'createWriteStream'), instead of the weaker not.toThrow() / metrics.increment assertions - Add null/undefined/primitive rejection tests for validateTokenUsageRecord Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/4a05b5f0-4b15-4797-96da-617484b64562
Addressed all three review items in commit
|
Smoke Test: Copilot BYOK (Offline) Mode
Running in BYOK offline mode ( Author: Overall: PARTIAL (tests 2 & 3 inconclusive due to unexpanded template variables in workflow step)
|
🧪 Smoke Test Results
Overall: PASS (core connectivity confirmed) Author:
|
|
Smoke Test Results:
Overall: PASS
|
Chroot Smoke Test Results
Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environment.
|
|
feat: publish versioned JSON Schema for AWF config file Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Smoke Test: Services Connectivity
Overall: FAIL
|
AWF emits
token-usage.jsonlandaudit.jsonlat runtime with no published contract, making consumers fragile to silent field changes and preventing external validation.Schema files
Added
schemas/with JSON Schema (draft 2020-12) definitions:token-usage.v1.schema.jsontoken-usage.jsonlcontainers/api-proxy/token-tracker.jsaudit.v1.schema.jsonaudit.jsonlsrc/squid-config.ts)schemas/README.mddocuments the versioning policy: additive changes stay inv1; breaking changes (removal, rename, type change, new required field) create a new major version file._schemafield embedded in every recordEvery emitted record now carries its schema identifier so consumers can branch on version:
{"_schema":"token-usage/v1","timestamp":"2025-01-01T00:00:00.000Z","provider":"anthropic",...} {"_schema":"audit/v1","ts":1774290908.910,"client":"172.30.0.20","decision":"TCP_TUNNEL",...}token-tracker.js—_schema: 'token-usage/v1'added to both HTTP and WebSocket record-building pathssquid-config.ts—"_schema":"audit/v1"prepended to theaudit_jsonlSquid logformat stringSchema validation before write (token-tracker)
Added
validateTokenUsageRecord()— a zero-external-dependency validator that checks all 13 required fields and their types before writing. Invalid records are dropped with atoken_record_schema_violationwarning log rather than crashing the proxy.Tests
validateTokenUsageRecord: unit tests for all valid/invalid pathssquid-config.test.ts: assertsaudit_jsonllogformat includes_schemaand all required fieldslog-parser.test.ts:parseAuditJsonlLinecorrectly handles records that include_schematrackTokenUsageandtrackWebSocketTokenUsagereachwriteTokenUsage(implying records pass validation, which rejects records missing_schema)