@@ -40,6 +40,183 @@ uv run pre-commit install
4040uv run pre-commit run --all-files
4141```
4242
43+ ## Testing the ` trace-claude-code ` plugin
44+
45+ Bash test suite for the hook scripts. Tests run the hooks against a
46+ stubbed ` curl ` , capture the resulting HTTP requests, and assert on the
47+ inferred span tree.
48+
49+ ### Running
50+
51+ ``` sh
52+ # From the repo root:
53+ make test
54+
55+ # Or run a specific test file:
56+ bash plugins/trace-claude-code/test/run_tests.sh test_e2e
57+ bash plugins/trace-claude-code/test/run_tests.sh test_replay test_queue
58+ ```
59+
60+ ### Layout
61+
62+ ```
63+ plugins/trace-claude-code/test/
64+ ├── helpers/
65+ │ ├── assert.sh # describe / it / assert_eq / assert_contains, color output
66+ │ ├── harness.sh # setup_test_env, teardown_test_env, run_hook
67+ │ ├── curl_stub.sh # curl() shell function that captures requests + returns canned responses
68+ │ ├── fixtures.sh # builders for hook input JSON (fixture_session_start, etc.)
69+ │ ├── span_tree.sh # all_spans, span_count_by_type, span_by_name, children_of, ...
70+ │ └── replay.sh # replay_session, describe_fixture
71+ ├── fixtures/
72+ │ └── sessions/ # captured Claude sessions used by test_replay.sh
73+ ├── test_*.sh # one file per area
74+ ├── record_session.sh # CLI to prep a fixture directory for capturing
75+ └── run_tests.sh # entry point
76+ ```
77+
78+ ### Writing a test
79+
80+ Each ` test_*.sh ` follows this pattern:
81+
82+ ``` bash
83+ #! /bin/bash
84+ SCRIPT_DIR=" $( cd " $( dirname " ${BASH_SOURCE[0]} " ) " && pwd) "
85+ source " $SCRIPT_DIR /helpers/assert.sh"
86+ source " $SCRIPT_DIR /helpers/harness.sh"
87+
88+ describe " my feature"
89+
90+ t_my_test_body () {
91+ # setup_test_env has already created an isolated $HOME and stubbed curl
92+ stub_response_for " */v1/project_logs/*/insert" 200 ' {"row_ids":["row_1"]}'
93+
94+ run_hook session_start.sh " $( fixture_session_start " s1" " /tmp/x" ) "
95+
96+ assert_eq " $( span_count_by_type task) " " 1"
97+ }
98+
99+ it " does the thing" t_my_test_body
100+ ```
101+
102+ Key conventions:
103+
104+ - ` describe "..." ` is a section header (purely visual).
105+ - ` it "name" function_name ` runs ` function_name ` between ` setup_test_env `
106+ and ` teardown_test_env ` , then prints a ✓ or ✗.
107+ - Assertions (` assert_eq ` , ` assert_contains ` , ` assert_failure ` , ...) record
108+ failures into the current test but do ** not** abort. Multiple assertions
109+ per test are fine.
110+ - Hooks are run synchronously in tests via ` BRAINTRUST_SYNC_QUEUE=true `
111+ set by ` setup_test_env ` . Span queue tests opt out of this when needed.
112+
113+ ### Capturing a real session as a test fixture
114+
115+ The hooks support recording every invocation to disk when the env var
116+ ` BRAINTRUST_RECORD_DIR ` is set. The recorded data can then be replayed
117+ in a test.
118+
119+ #### 1. Prepare a fixture directory
120+
121+ ``` sh
122+ plugins/trace-claude-code/test/record_session.sh my-fixture
123+ ```
124+
125+ This prints a ` BRAINTRUST_RECORD_DIR ` value pointing at
126+ ` test/fixtures/sessions/my-fixture/ ` .
127+
128+ #### 2. Run Claude Code with recording on
129+
130+ ``` sh
131+ export BRAINTRUST_RECORD_DIR=/abs/path/to/test/fixtures/sessions/my-fixture
132+ claude
133+ # ... use Claude Code normally ...
134+ ```
135+
136+ While ` BRAINTRUST_RECORD_DIR ` is set:
137+
138+ - Every hook invocation appends one NDJSON record to
139+ ` events.ndjson ` containing ` {ts, hook, payload} ` .
140+ - The ` stop_hook ` also copies the referenced transcript file into
141+ ` transcripts/<session_id>.jsonl ` .
142+
143+ You do not need to modify hook scripts or set anything else - the recorder
144+ runs inside the existing hooks.
145+
146+ #### 3. Inspect the fixture
147+
148+ ``` sh
149+ plugins/trace-claude-code/test/record_session.sh --describe my-fixture
150+ ```
151+
152+ Output:
153+
154+ ```
155+ Fixture: .../test/fixtures/sessions/my-fixture
156+ Events: 14
157+ Hook counts:
158+ post_tool_use: 8
159+ session_end: 1
160+ session_start: 1
161+ stop_hook: 3
162+ user_prompt_submit: 1
163+ Transcripts: 1
164+ ```
165+
166+ #### 4. Replay it in a test
167+
168+ ``` bash
169+ t_replay_my_fixture () {
170+ stub_response_for " */v1/project_logs/*/insert" 200 ' {"row_ids":["row_1"]}'
171+
172+ local n
173+ n=$( replay_session " $SCRIPT_DIR /fixtures/sessions/my-fixture" )
174+ assert_success " $? "
175+ assert_eq " $n " " 14"
176+
177+ # Now assert on the span tree the hooks produced
178+ assert_eq " $( span_count_by_type tool) " " 8"
179+ assert_eq " $( span_count_by_type llm) " " 3"
180+ }
181+
182+ it " my real-world fixture produces the expected spans" t_replay_my_fixture
183+ ```
184+
185+ The replayer:
186+
187+ - Reads ` events.ndjson ` line by line in order.
188+ - For ` stop_hook ` events, rewrites ` payload.transcript_path ` to point at
189+ the bundled transcript so the replayed hook can read it.
190+ - Invokes the matching hook script via ` run_hook ` with the recorded
191+ payload.
192+
193+ #### When to use replay vs. synthetic fixtures
194+
195+ - ** Synthetic fixtures** (` fixture_session_start ` , etc.) - fast to write,
196+ test specific scenarios in isolation, no real Claude needed.
197+ - ** Replayed fixtures** - high-fidelity regression tests of real-world
198+ interactions. Use when you want to lock in behavior on a specific
199+ pattern of hooks you saw in the wild (e.g. a session with parallel
200+ tool calls, or a long multi-turn conversation).
201+
202+ ### Span-tree queries
203+
204+ The captured HTTP requests are parsed to extract the inserted spans. Available helpers:
205+
206+ | Function | Returns |
207+ | ---| ---|
208+ | ` all_spans ` | JSON array of every span sent to any ` /insert ` endpoint |
209+ | ` span_count ` | total number of spans |
210+ | ` span_count_by_type "tool" ` | count of spans with ` span_attributes.type == "tool" ` |
211+ | ` spans_named "^Turn " ` | array of spans whose name matches the regex |
212+ | ` span_by_name "^Turn 1$" ` | first matching span (or ` null ` ) |
213+ | ` span_by_type "llm" ` | first span of that type |
214+ | ` span_by_id "..." ` | span with the given ` span_id ` |
215+ | ` children_of "<span_id>" ` | array of spans whose first parent is the given id |
216+ | ` is_child_of "<child_id>" "<parent_id>" ` | exit 0 if true |
217+
218+ All return JSON on stdout; combine with ` jq ` for further drilling.
219+
43220# Updating the plugin
44221
45222After making changes:
0 commit comments