|
| 1 | +# Testing Guidelines - Strands Python SDK |
| 2 | + |
| 3 | +> **IMPORTANT**: When writing tests, you **MUST** follow the guidelines in this document. They keep tests consistent, maintainable, and resilient to unrelated changes. |
| 4 | +
|
| 5 | +This document is the authoritative testing reference for the Python SDK. For general development guidance, see [AGENTS.md](../AGENTS.md). |
| 6 | + |
| 7 | +## Test Layout |
| 8 | + |
| 9 | +- **Unit tests** mirror `src/strands/` exactly under `tests/strands/` — `tests/strands/agent/test_agent.py` tests `src/strands/agent/agent.py`. Do not put unit tests anywhere else. |
| 10 | +- **Integration tests** live in `tests_integ/`, organized by feature area, and run only in CI (they need real provider credentials). Whole-message-dict assertions in integ tests are especially brittle — see [Assertions](#assertions). |
| 11 | +- **Test files are prefixed `test_`**; test functions are prefixed `test_`. |
| 12 | + |
| 13 | +```bash |
| 14 | +hatch test # Run unit tests |
| 15 | +hatch test -c # Run with coverage |
| 16 | +hatch test tests/strands/agent/ # Run a specific directory |
| 17 | +hatch run test-integ # Run integration tests |
| 18 | +hatch test --all # All Python versions (3.10–3.14) |
| 19 | +``` |
| 20 | + |
| 21 | +## Test Fixtures Quick Reference |
| 22 | + |
| 23 | +Reusable fixtures live in `tests/fixtures/`; shared async/AWS helpers are session-scoped fixtures in `tests/conftest.py`. **Reuse these — do not hand-roll a mock model, hook recorder, or tool.** If you find yourself writing one, check here first. |
| 24 | + |
| 25 | +| Fixture / helper | Location | When to use | |
| 26 | +| --- | --- | --- | |
| 27 | +| `MockedModelProvider` | `tests/fixtures/mocked_model_provider.py` | Drive the agent loop with a pre-defined sequence of responses (optionally with per-response `Usage` for metrics paths) instead of a real provider | |
| 28 | +| `MockHookProvider` | `tests/fixtures/mock_hook_provider.py` | Record hook invocations and assert which lifecycle events fired | |
| 29 | +| `MockMultiAgentHookProvider` | `tests/fixtures/mock_multiagent_hook_provider.py` | Same, for multi-agent lifecycle events | |
| 30 | +| `MockAgentTool` | `tests/fixtures/mock_agent_tool.py` | A stand-in `AgentTool` when a tool is incidental to the test | |
| 31 | +| `MockedSessionRepository` | `tests/fixtures/mock_session_repository.py` | In-memory `SessionRepository` for session/persistence tests | |
| 32 | +| `agenerator` | `tests/conftest.py` | Wrap a list as an async generator (feed a `MockedModelProvider.stream` or any `async for`) | |
| 33 | +| `alist` | `tests/conftest.py` | Collect an async iterable into a list (`events = await alist(agent.stream_async(...))`) | |
| 34 | +| `generate` | `tests/conftest.py` | Drive a sync generator to exhaustion, returning `(yielded_events, return_value)` | |
| 35 | +| `moto_env`, `moto_mock_aws` | `tests/conftest.py` | Mock AWS credentials/services with `moto` — never hit real AWS in a unit test | |
| 36 | + |
| 37 | +## Async Tests |
| 38 | + |
| 39 | +Pytest runs in **strict asyncio mode** (the default — `asyncio_mode` is not set). Every coroutine test **MUST** carry an explicit marker: |
| 40 | + |
| 41 | +```python |
| 42 | +@pytest.mark.asyncio |
| 43 | +async def test_streams_tool_use(agenerator): |
| 44 | + ... |
| 45 | +``` |
| 46 | + |
| 47 | +A coroutine test without the marker is silently skipped, not run. Consume agent/model streams with the `alist` fixture rather than re-implementing the `async for` collection loop. |
| 48 | + |
| 49 | +## Assertions |
| 50 | + |
| 51 | +**Name the assertion pair `tru_<noun>` / `exp_<noun>`.** This is the one sanctioned exception to the "name every variable for its content" rule — the matched prefixes make arrange/assert pairs scannable. Compute the actual value into `tru_<noun>`, define the expected value as `exp_<noun>`, then compare: |
| 52 | + |
| 53 | +```python |
| 54 | +tru_result = agent("hello") |
| 55 | +exp_result = {"role": "assistant", "content": [{"text": "hi"}]} |
| 56 | +assert tru_result == exp_result |
| 57 | +``` |
| 58 | + |
| 59 | +The right granularity depends on **who controls the expected shape**: |
| 60 | + |
| 61 | +**When you control the shape, assert the whole object — not field-by-field.** A single equality on the whole structure documents the expected shape and catches unexpected extra fields. Mask only genuinely volatile fields (timestamps, generated ids, metadata) with `unittest.mock.ANY`: |
| 62 | + |
| 63 | +```python |
| 64 | +tru_message = result.message |
| 65 | +exp_message = {"role": "assistant", "content": [{"text": "abc"}], "metadata": unittest.mock.ANY} |
| 66 | +assert tru_message == exp_message |
| 67 | +``` |
| 68 | + |
| 69 | +**When the shape is an externally-evolving type, assert only the fields the test is about.** This is the natural exception to the rule above: a type like `Message` grows fields the test doesn't care about, so asserting the entire dict couples the test to unrelated churn. If the test is about the text or role, assert just that (`assert result.message["role"] == "assistant"`) — pinning the whole `Message` shape is a recurring source of breakage in `tests_integ/`. |
| 70 | + |
| 71 | +## Test Organization |
| 72 | + |
| 73 | +- **Name tests `test_<method>_<description>`.** The test module already names the subject, so the class/subject should be omitted: `test__init__default_model_id`, `test_update_config_validation_warns_on_unknown_keys`. Add a subject prefix (`test_<subject>_<method>_<description>`) **only** when one module covers several subjects and the bare method name would be ambiguous. |
| 74 | +- **Default to flat, module-level functions.** Most of the suite is flat. |
| 75 | +- **Use a `class Test<Subject>` only when tests share class-scoped setup or fixtures** — not merely to group related cases (a module already groups them). Inside a class, the method name can drop the redundant subject since the class supplies it. |
| 76 | +- **Parametrize repetitive cases** with `@pytest.mark.parametrize` instead of copy-pasting a test body across inputs. |
| 77 | +- **Keep tests focused and independent**; import packages at the top of the file. |
| 78 | + |
| 79 | +## Comments in Tests |
| 80 | + |
| 81 | +The evergreen-comment rule applies here too: a regression test should **link the issue and describe the behavior it now guarantees**, not narrate what the code used to do. Prefer `# guards against unbounded retry on a None tool name (#642)` over `# we used to hang on None here`. |
0 commit comments