This guide explains how to add behavioral tests for an agent that doesn't have test coverage yet.
- The agent is deployed and exposes the standard
/chat/completionsand/healthendpoints - You know what tools the agent has (check its
src/directory for@tooldecorators or tool definitions) - Test dependencies are installed:
uv pip install -e ".[test]"
mkdir -p agents/<framework>/<agent>/tests/behavioral/fixturesFor example, the LangGraph ReAct agent tests live at:
agents/langgraph/react_agent/tests/behavioral/
├── conftest.py
├── test_tool_usage.py
├── test_response_quality.py
├── test_cost_latency.py
├── test_reliability.py
└── fixtures/
└── golden_queries.yaml
The conftest defines fixtures specific to your agent. Because agent tests live under agents/ (a separate directory tree from tests/behavioral/), pytest's conftest discovery won't find the shared fixtures. You must define http_client and eval_config locally. At minimum you need:
agent_url— reads from a new env var specific to your agenthttp_client— must be redefined locally (not inherited fromtests/behavioral/conftest.py)eval_config— resolves the path totests/behavioral/configs/thresholds.yamlrelative to the repo root (adjust.parents[N]based on your agent's directory depth)known_tools— lists the tools in the agent's schema (used by hallucination detection)agent_thresholds— pulls from the sharedeval_configfixturerun_eval— overrides the root fixture to add MLflow trace enrichment
load_golden() helper: Import the shared loader from harness.fixtures and create a thin wrapper that binds fixtures_dir to Path(__file__).parent / "fixtures":
from pathlib import Path
from typing import Any
from harness.fixtures import load_golden as _load_golden_from
FIXTURES_DIR = Path(__file__).parent / "fixtures"
def load_golden(category: str | None = None) -> list[dict[str, Any]]:
return _load_golden_from(FIXTURES_DIR, category)See existing agent implementations for working examples:
agents/langgraph/react_agent/tests/behavioral/conftest.pyagents/vanilla_python/openai_responses_agent/tests/behavioral/conftest.pyagents/autogen/mcp_agent/tests/behavioral/conftest.pyagents/crewai/websearch_agent/tests/behavioral/conftest.pyagents/langgraph/agentic_rag/tests/behavioral/conftest.py
Add a section for your agent in tests/behavioral/configs/thresholds.yaml:
my_agent:
tool_selection_accuracy: 0.85
response_coherence_accuracy: 0.75
max_latency_p95: 10.0
pass_at_k: 8Create fixtures/golden_queries.yaml with test inputs for your agent:
queries:
- query: "A question that should trigger tool_a"
expected_tools: ["tool_a"]
expected_elements: ["keyword_from_tool_output"]
difficulty: easy
category: factual
- query: "Hello"
expected_tools: []
expected_elements: []
difficulty: easy
category: greetingThere are four standard test files. Each one uses pytest markers to describe what is being tested, not which agent:
| File | Marker | What it tests |
|---|---|---|
test_tool_usage.py |
@pytest.mark.<agent> |
Correct tool selection, no hallucinated tools, valid args |
test_response_quality.py |
@pytest.mark.<agent> |
Plan coherence, completeness, multi-tool synthesis |
test_cost_latency.py |
@pytest.mark.<agent> |
Response time within threshold |
test_reliability.py |
@pytest.mark.<agent>, @pytest.mark.slow |
pass@k consistency over repeated runs |
See the existing implementations for reference:
agents/langgraph/react_agent/tests/behavioral/(single tool:search)agents/vanilla_python/openai_responses_agent/tests/behavioral/(two tools:search_price,search_reviews)agents/autogen/mcp_agent/tests/behavioral/(two tools via MCP:add,sub)agents/crewai/websearch_agent/tests/behavioral/(single tool:Web Search)agents/langgraph/agentic_rag/tests/behavioral/(single tool:retriever)
Add your agent marker to pyproject.toml under the agent markers section:
markers = [
# Agent markers
"langgraph_react: ...",
"vanilla_python: ...",
"my_agent: My Agent description (tool_a + tool_b)",
...
]Add your agent's URL env var to the table in the Behavioral Tests section of the root README.md.
# Check tests are discovered
pytest agents/<framework>/<agent>/tests/behavioral/ --collect-only
# Run against a deployed agent
MY_AGENT_URL=https://my-agent.example.com pytest agents/<framework>/<agent>/tests/behavioral/ -v