Problem
AWF components emit several JSONL files as runtime artifacts:
token-usage.jsonl (api-proxy token-tracker.js) — per-API-call token usage records
audit.jsonl (Squid proxy) — L7 HTTP/HTTPS traffic decisions (allow/deny per request)
events.jsonl (Copilot CLI session state) — agent session events
These files are consumed by:
- The
awf logs stats / awf logs summary commands (log-aggregator, log-parser)
- GitHub Actions workflows that upload
firewall-audit-logs artifacts
- External tooling and compliance auditors analyzing workflow security posture
- Research scripts (e.g.,
scripts/paper/collect-token-data.ts)
The problem: there is no published schema for these files. The format is defined implicitly by the code that writes them, making it fragile for consumers:
- If a field is added, renamed, or its type changes, consumers break silently
- External auditors cannot validate that logs conform to an expected structure
gh-aw and other upstream tools cannot programmatically discover what fields are available
- There is no internal compliance mechanism ensuring writers conform to a contract
Proposal
1. Define schemas (JSON Schema or TypeScript interfaces)
For each JSONL file, publish a versioned schema describing the record structure:
token-usage.jsonl (current implicit schema from token-tracker.js):
{
"timestamp": "string (ISO 8601)",
"request_id": "string (UUID)",
"provider": "string (anthropic|openai|copilot|gemini)",
"model": "string",
"path": "string (API endpoint path)",
"status": "number (HTTP status code)",
"streaming": "boolean",
"input_tokens": "number",
"output_tokens": "number",
"cache_read_tokens": "number",
"cache_write_tokens": "number",
"duration_ms": "number"
}
audit.jsonl (current Squid logformat):
{
"ts": "number (Unix timestamp with ms)",
"client": "string (IP)",
"host": "string (domain)",
"dest": "string (IP:port)",
"method": "string (CONNECT|GET|POST|...)",
"status": "number (HTTP status)",
"decision": "string (TCP_TUNNEL|TCP_DENIED|...)",
"url": "string"
}
2. Internal schema compliance
- Writers (token-tracker.js, squid-config.ts) should validate records against the schema before emitting
- Tests should assert that emitted records conform to the schema
- Schema changes require a version bump so consumers can handle migrations
3. Publish schemas as artifacts
- Include schema files in releases (e.g.,
schemas/token-usage.v1.schema.json)
- Embed schema version in each JSONL record (e.g.,
"_schema": "token-usage/v1") or in a companion .schema.json file alongside the JSONL
- Document schema evolution policy (additive-only for minor versions, breaking changes = new major)
Benefits
- Auditability: compliance tools can validate that AWF logs contain expected fields
- Extensibility: new fields (e.g.,
resolved_model for aliasing, rate_limit_applied) can be added with confidence that consumers handle unknowns gracefully
- Interoperability: gh-aw, external SIEM tools, and research scripts can rely on a stable contract
- Regression detection: CI tests validate writers never emit non-conforming records
Current Writers
| File |
Writer |
Location |
token-usage.jsonl |
api-proxy token tracker |
containers/api-proxy/token-tracker.js:34 |
audit.jsonl |
Squid logformat |
src/squid-config.ts:603-609 |
events.jsonl |
Copilot CLI (external) |
Consumed via --session-state-dir |
Problem
AWF components emit several JSONL files as runtime artifacts:
token-usage.jsonl(api-proxytoken-tracker.js) — per-API-call token usage recordsaudit.jsonl(Squid proxy) — L7 HTTP/HTTPS traffic decisions (allow/deny per request)events.jsonl(Copilot CLI session state) — agent session eventsThese files are consumed by:
awf logs stats/awf logs summarycommands (log-aggregator, log-parser)firewall-audit-logsartifactsscripts/paper/collect-token-data.ts)The problem: there is no published schema for these files. The format is defined implicitly by the code that writes them, making it fragile for consumers:
gh-awand other upstream tools cannot programmatically discover what fields are availableProposal
1. Define schemas (JSON Schema or TypeScript interfaces)
For each JSONL file, publish a versioned schema describing the record structure:
token-usage.jsonl(current implicit schema fromtoken-tracker.js):{ "timestamp": "string (ISO 8601)", "request_id": "string (UUID)", "provider": "string (anthropic|openai|copilot|gemini)", "model": "string", "path": "string (API endpoint path)", "status": "number (HTTP status code)", "streaming": "boolean", "input_tokens": "number", "output_tokens": "number", "cache_read_tokens": "number", "cache_write_tokens": "number", "duration_ms": "number" }audit.jsonl(current Squid logformat):{ "ts": "number (Unix timestamp with ms)", "client": "string (IP)", "host": "string (domain)", "dest": "string (IP:port)", "method": "string (CONNECT|GET|POST|...)", "status": "number (HTTP status)", "decision": "string (TCP_TUNNEL|TCP_DENIED|...)", "url": "string" }2. Internal schema compliance
3. Publish schemas as artifacts
schemas/token-usage.v1.schema.json)"_schema": "token-usage/v1") or in a companion.schema.jsonfile alongside the JSONLBenefits
resolved_modelfor aliasing,rate_limit_applied) can be added with confidence that consumers handle unknowns gracefullyCurrent Writers
token-usage.jsonlcontainers/api-proxy/token-tracker.js:34audit.jsonlsrc/squid-config.ts:603-609events.jsonl--session-state-dir