You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Test through public APIs, not private methods (prefixed with `_`) or helpers — validates actual user-facing behavior and prevents brittle tests tied to implementation details
141
+
- Prefer feature-centric parametrized test files (e.g. `test_multimodal_tool_returns.py`) over appending to monolithic `test_<provider>.py` files — the legacy per-provider files are large and hard for agents to navigate; new features should get their own test file with a `Case` class and parametrized providers
142
+
- Use `snapshot()` for complex structured outputs (objects, message sequences, API responses, nested dicts) — catches unexpected changes more reliably than field-by-field assertions; use `IsStr` and similar matchers for variable values
143
+
- Assert the core aspect of the change being introduced — use whatever means necessary: patching clients to inspect request payloads, tapping into pydantic-ai internals, snapshot comparisons. Snapshots are valuable for catching structural drift in objects and message arrays, but only use `result.all_messages()` or output assertions when the structure demonstrates behavior you care about keeping consistent
144
+
- Test both positive and negative cases for optional capabilities (model features, server features, streaming) — ensures features work when supported AND fail gracefully when absent
145
+
- Ensure test assertions match test names and docstrings — tests without proper assertions or that verify opposite behavior create false positives
146
+
- Test MCP against real `tests.mcp_server` instance, not mocks — extend test server with helper tools to expose runtime context (instructions, client info, session state)
147
+
- Remove stale test docstrings, comments, and historical provider bug notes when behavior changes
<!-- braindump: rules extracted from PR review patterns -->
158
-
159
-
# tests/ Guidelines
160
-
161
-
## Testing
162
-
163
-
<!-- rule:177 -->
164
-
- Test through public APIs, not private methods (prefixed with `_`) or helpers — Prevents brittle tests tied to implementation details, reduces maintenance burden when refactoring internals, and validates actual user-facing behavior rather than isolated units
165
-
<!-- rule:173 -->
166
-
- Maintain 1:1 correspondence between test files and source modules (`test_{module}.py`) — consolidate related tests instead of splitting by feature, config, or test type — Prevents test suite fragmentation and makes tests easier to locate by matching source structure; use fixtures/markers to distinguish test types within the file
167
-
<!-- rule:86 -->
168
-
- Use `snapshot()` for complex structured outputs (objects, message sequences, API responses, nested dicts, span attributes) — prevents brittle field-by-field assertions and improves test maintainability — Snapshot testing catches unexpected changes in complex structures more reliably than manual assertions, and `IsStr` matchers handle variable values gracefully
169
-
<!-- rule:318 -->
170
-
- Use `pytest-vcr` cassettes (not mocks) in `tests/models/` — records real HTTP interactions for deterministic replay, captures both success and error cases — Ensures integration tests validate real API behavior without live calls on every run, making tests faster and preventing flakiness from network issues or rate limits
171
-
<!-- rule:334 -->
172
-
- Assert meaningful behavior in tests, not just code execution or type checks — validates correctness and data flow — Prevents false confidence from tests that pass without verifying actual functionality works as intended
173
-
<!-- rule:194 -->
174
-
- In agent/model/stream tests, assert on final output AND snapshot `result.all_messages()` — validates complete execution trace, not just end result — Catches regressions in tool calls, intermediate steps, and message flow that final output assertions miss
175
-
<!-- rule:363 -->
176
-
- Test through real APIs, not mocks — mock only slow/external dependencies outside your control — Improves refactoring safety, documents real usage patterns, and catches integration issues — use lightweight local infrastructure (test servers, in-memory DBs) for systems you control (provider APIs, Temporal workflows, frameworks) in files like `test_{provider}.py`; reserve mocks for third-party HTTP APIs and unreliable external services
177
-
<!-- rule:11 -->
178
-
- Parametrize tests across all providers that support the feature (or at minimum OpenAI, Anthropic, Google) — catches provider-specific regressions and ensures cross-provider compatibility — Prevents breaking unchanged providers when modifying shared model logic, and surfaces integration issues across different provider APIs before they reach production
179
-
<!-- rule:385 -->
180
-
- Ensure test assertions match test names and docstrings — prevents false confidence in test coverage and catches actual regressions — Tests without proper assertions or that verify opposite behavior create false positives and fail to catch bugs they claim to prevent.
181
-
<!-- rule:89 -->
182
-
- Test both positive and negative cases for optional capabilities (model features, server features, streaming) — ensures features work when supported AND fail gracefully when absent — Prevents false confidence from tests that only check unsupported cases, catching both implementation bugs and missing error handling
183
-
<!-- rule:630 -->
184
-
- Test MCP against real `tests.mcp_server` instance, not mocks — extend test server with helper tools to expose runtime context (instructions, client info, session state) — Verifies actual data flow and integration behavior rather than just testing mock interfaces, catching real-world issues that mocks would miss
185
-
186
-
## General
187
-
188
-
<!-- rule:463 -->
189
-
- Remove stale test docstrings, comments, and historical provider bug notes when behavior changes — Outdated test documentation misleads developers about what's actually being tested and why
0 commit comments