Skip to content

fix: provider spec conformance audit — 29 violations fixed, 50+ drift tests added#168

Merged
jpr5 merged 9 commits intomainfrom
blitz/conformance-audit
May 8, 2026
Merged

fix: provider spec conformance audit — 29 violations fixed, 50+ drift tests added#168
jpr5 merged 9 commits intomainfrom
blitz/conformance-audit

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented May 8, 2026

Summary

Systematic spec conformance audit of all 21 aimock providers via 60+ specialized agents (MSAL). Found and fixed 29 spec violations across 10 providers. Added 50+ new drift tests covering 9 previously untested providers plus error shapes and reasoning/thinking for 5 existing providers.

Phase 0: Drift test gap closure (18 agents)

  • 9 new drift test files: images, speech, transcription, moderation, ElevenLabs, fal.ai (shapes + queue lifecycle), video, rerank
  • Error shape tests for OpenAI Chat, Responses, Claude, Gemini, Cohere
  • Reasoning/thinking drift tests for OpenAI Chat, Responses, Claude, Gemini

Phase 1-3: Per-provider conformance audit (33 agents)

Every handler audited for: request conversion, non-streaming shape, streaming shape, tool calls, error format.

Phase 4: Fix round (7 agents)

Request conversion fixes:

  • Responses API: max_output_tokens and response_format silently dropped
  • Gemini: maxOutputTokens, topP, topK dropped; spurious functionCall.id
  • Cohere: structured content, native v2 tool format, temperature, max_tokens
  • Ollama: tool_calls on assistant messages, images, system on generate

Response shape fixes:

  • OpenAI Chat: error responses missing param: null, leaking internal status
  • Moderation: missing illicit/illicit/violent categories
  • Transcription: verbose response always emitting empty words/segments
  • Search: missing Tavily fields (query, images, response_time, answer)
  • Rerank: spurious document field

WebSocket fixes:

  • Realtime WS: 9 violations — ID prefixes (evt-event_), missing session/response fields, previous_item_id tracking, output item status
  • Gemini Live WS: 4 violations — config alias, standalone turnComplete, complete httpToGrpc mapper

Anthropic thinking:

  • signature field + signature_delta event added to Claude Messages and Bedrock invoke thinking blocks

CR: 13-agent MSAL review

Found 4 additional bugs (previous_item_id tracking, incomplete gRPC mapper, test false positive, Anthropic signature). All fixed.

Closes audit plan: https://www.notion.so/35a3aa38185281cc8001c5e863a098dd

Test plan

  • pnpm test — 2873 passed
  • npx tsc --noEmit — clean
  • 50+ new drift tests with SDK shape comparisons
  • Negative assertions throughout
  • 13-agent CR converged (4 findings fixed, confirmation implicit in combined verify)

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 8, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@copilotkit/aimock@168

commit: f5ad360

@jpr5 jpr5 merged commit 37d3335 into main May 8, 2026
22 checks passed
@jpr5 jpr5 deleted the blitz/conformance-audit branch May 8, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant