Why This Exists
Agent runs are often flaky and expensive:
- Non-determinism: The same prompt can produce different outputs.
- Slow feedback: Integration tests spend 90% of their time waiting on network calls.
- Cost: Repeated debugging burns tokens and money.
Cassette turns "It failed yesterday" into a replayable, immutable artifact.
Enterprise Use Case: Reliable Code Generation
(See examples/node-red-generator.ts)
This tool is designed for platforms like FlowFuse or Node-RED where AI agents generate executable code.
Agent Cassette provides a Regression Testing Harness that ensures:
- Strict Schema Validation: Agents must output valid JSON structures (e.g., correct
wiresandcoordinates). - Semantic Safety: If an agent generates unsafe code (e.g., missing
return msg), the system detects it and swaps in a safe fallback. - Deterministic Replay: We record a "Golden Run" of a complex flow generation. CI pipelines can replay this instantly (0 cost) to prove that model upgrades (e.g., GPT-4o → GPT-5) don't break the JSON schema.