Skip to content

Commit 37bba72

Browse files
test: add 29 E2E trace tests for issues #571-575 (#593)
* test: add 29 E2E trace tests for worker, threading, tools, workspace, and routines (#571-575) Add comprehensive E2E test coverage across five test files: - e2e_worker_coverage (7 tests): parallel tool calls, error feedback, unknown tools, invalid params, rate limiting, iteration limits, planning mode - e2e_thread_scheduling (3 tests + 2 deferred): multi-turn state, undo/redo, concurrent dispatch - e2e_builtin_tool_coverage (8 tests): time parse/diff/invalid, routine CRUD/history, job create/status/list/cancel, HTTP replay - e2e_workspace_coverage (6 tests): chunked search, multi-doc search, hybrid search, directory tree, document lifecycle, identity in system prompt - e2e_routine_heartbeat (5 tests): cron triggers, event matching, cooldown enforcement, heartbeat findings, empty checklist skip Infrastructure: extend TestRig with database/workspace/trace_llm accessors, register job and routine tools by default, add with_extra_tools() for custom stub tools. Includes 24 JSON trace fixtures across worker/, threading/, tools/, and workspace/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use 6-field cron format in routine_create_list fixture The cron 0.13 crate accepts both 6 and 7 fields, but the routine_create tool documents 6-field format. Align the fixture to match. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: eliminate vacuous passes and silently-skipped assertions in E2E tests - job_create_status: replace job_status (needs dynamic UUID) with list_jobs, assert both succeed via completed() not just started() - job_list_cancel: keep cancel_job but explicitly assert it fails with invalid canned job_id "latest", verify create_job + list_jobs succeed - unknown_tool_name: add !is_empty() guard before .all() to prevent vacuous pass on empty iterator - workspace tests: change `if let Some(ws)` to `.expect()` so assertions are never silently skipped when workspace/trace_llm is available [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add template substitution to TraceLlm for dynamic tool result forwarding Add {{call_id.json_path}} template syntax to trace fixtures, enabling tool results from one step to flow into subsequent steps' arguments. TraceLlm extracts variables from Role::Tool messages (stripping the safety layer's <tool_output> XML wrapper and unescaping entities) and substitutes them in canned tool_call arguments before returning. This fixes job_create_status and job_list_cancel tests to properly test job_status and cancel_job with real dynamic UUIDs from create_job, instead of using invalid canned IDs that silently failed. Also adds tool result content assertions to job_create_status to verify the actual tool output contains expected data (job_id, title). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback on E2E tests - undo_redo_cycle: assert exactly 3 turns instead of >= 2 - tool_error_feedback: use tempfile::tempdir() instead of hardcoded /tmp path, patch fixture path at runtime for CI portability - worker_timeout → iteration_limit: rename to accurately describe what's tested - post_plan_work_remaining → simple_echo_flow: rename, test doesn't exercise planning - identity_in_system_prompt: seed IDENTITY.md before test, assert system prompt contains the seeded content instead of just checking Role::System exists [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen workspace E2E test assertions per PR review - write_chunk_search: assert memory_search was called and returned payment/architecture-related results - multi_document_search: assert memory_search was called for cross-document search - hybrid_search_with_embeddings: assert both memory_write and memory_search were called to confirm write-then-search pipeline - directory_tree: assert tree output contains expected alpha/beta project paths [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2df9602 commit 37bba72

31 files changed

Lines changed: 3073 additions & 4 deletions

tests/e2e_builtin_tool_coverage.rs

Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
//! E2E trace tests: builtin tool coverage (#573).
2+
//!
3+
//! Covers time (parse, diff, invalid), routine (create, list, update, delete,
4+
//! history), job (create, status, list, cancel), and HTTP replay.
5+
6+
#[cfg(feature = "libsql")]
7+
mod support;
8+
9+
#[cfg(feature = "libsql")]
10+
mod tests {
11+
use std::time::Duration;
12+
13+
use crate::support::test_rig::TestRigBuilder;
14+
use crate::support::trace_llm::LlmTrace;
15+
16+
// -----------------------------------------------------------------------
17+
// Test 1: time_parse_and_diff
18+
// -----------------------------------------------------------------------
19+
20+
#[tokio::test]
21+
async fn time_parse_and_diff() {
22+
let trace = LlmTrace::from_file(concat!(
23+
env!("CARGO_MANIFEST_DIR"),
24+
"/tests/fixtures/llm_traces/tools/time_parse_diff.json"
25+
))
26+
.expect("failed to load time_parse_diff.json");
27+
28+
let rig = TestRigBuilder::new()
29+
.with_trace(trace.clone())
30+
.build()
31+
.await;
32+
33+
rig.send_message("Parse a time and compute a diff").await;
34+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
35+
36+
rig.verify_trace_expects(&trace, &responses);
37+
38+
// Time tool should have been called twice (parse + diff).
39+
let started = rig.tool_calls_started();
40+
let time_count = started.iter().filter(|n| n.as_str() == "time").count();
41+
assert!(
42+
time_count >= 2,
43+
"Expected >= 2 time tool calls, got {time_count}"
44+
);
45+
46+
rig.shutdown();
47+
}
48+
49+
// -----------------------------------------------------------------------
50+
// Test 2: time_parse_invalid
51+
// -----------------------------------------------------------------------
52+
53+
#[tokio::test]
54+
async fn time_parse_invalid() {
55+
let trace = LlmTrace::from_file(concat!(
56+
env!("CARGO_MANIFEST_DIR"),
57+
"/tests/fixtures/llm_traces/tools/time_parse_invalid.json"
58+
))
59+
.expect("failed to load time_parse_invalid.json");
60+
61+
let rig = TestRigBuilder::new()
62+
.with_trace(trace.clone())
63+
.build()
64+
.await;
65+
66+
rig.send_message("Parse an invalid timestamp").await;
67+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
68+
69+
rig.verify_trace_expects(&trace, &responses);
70+
71+
// The time tool call should have failed (invalid timestamp).
72+
let completed = rig.tool_calls_completed();
73+
let time_results: Vec<_> = completed
74+
.iter()
75+
.filter(|(name, _)| name == "time")
76+
.collect();
77+
assert!(!time_results.is_empty(), "Expected time tool to be called");
78+
assert!(
79+
time_results.iter().any(|(_, ok)| !ok),
80+
"Expected at least one failed time call: {time_results:?}"
81+
);
82+
83+
rig.shutdown();
84+
}
85+
86+
// -----------------------------------------------------------------------
87+
// Test 3: routine_create_list
88+
// -----------------------------------------------------------------------
89+
90+
#[tokio::test]
91+
async fn routine_create_list() {
92+
let trace = LlmTrace::from_file(concat!(
93+
env!("CARGO_MANIFEST_DIR"),
94+
"/tests/fixtures/llm_traces/tools/routine_create_list.json"
95+
))
96+
.expect("failed to load routine_create_list.json");
97+
98+
let rig = TestRigBuilder::new()
99+
.with_trace(trace.clone())
100+
.build()
101+
.await;
102+
103+
rig.send_message("Create a daily routine and list all routines")
104+
.await;
105+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
106+
107+
rig.verify_trace_expects(&trace, &responses);
108+
109+
// Both routine_create and routine_list should have succeeded.
110+
let completed = rig.tool_calls_completed();
111+
assert!(
112+
completed.iter().any(|(n, ok)| n == "routine_create" && *ok),
113+
"routine_create should succeed: {completed:?}"
114+
);
115+
assert!(
116+
completed.iter().any(|(n, ok)| n == "routine_list" && *ok),
117+
"routine_list should succeed: {completed:?}"
118+
);
119+
120+
rig.shutdown();
121+
}
122+
123+
// -----------------------------------------------------------------------
124+
// Test 4: routine_update_delete
125+
// -----------------------------------------------------------------------
126+
127+
#[tokio::test]
128+
async fn routine_update_delete() {
129+
let trace = LlmTrace::from_file(concat!(
130+
env!("CARGO_MANIFEST_DIR"),
131+
"/tests/fixtures/llm_traces/tools/routine_update_delete.json"
132+
))
133+
.expect("failed to load routine_update_delete.json");
134+
135+
let rig = TestRigBuilder::new()
136+
.with_trace(trace.clone())
137+
.build()
138+
.await;
139+
140+
rig.send_message("Create, update, and delete a routine")
141+
.await;
142+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
143+
144+
rig.verify_trace_expects(&trace, &responses);
145+
146+
let started = rig.tool_calls_started();
147+
assert!(
148+
started.contains(&"routine_create".to_string()),
149+
"routine_create not started"
150+
);
151+
assert!(
152+
started.contains(&"routine_update".to_string()),
153+
"routine_update not started"
154+
);
155+
assert!(
156+
started.contains(&"routine_delete".to_string()),
157+
"routine_delete not started"
158+
);
159+
160+
rig.shutdown();
161+
}
162+
163+
// -----------------------------------------------------------------------
164+
// Test 5: routine_history
165+
// -----------------------------------------------------------------------
166+
167+
#[tokio::test]
168+
async fn routine_history() {
169+
let trace = LlmTrace::from_file(concat!(
170+
env!("CARGO_MANIFEST_DIR"),
171+
"/tests/fixtures/llm_traces/tools/routine_history.json"
172+
))
173+
.expect("failed to load routine_history.json");
174+
175+
let rig = TestRigBuilder::new()
176+
.with_trace(trace.clone())
177+
.build()
178+
.await;
179+
180+
rig.send_message("Create a routine and check its history")
181+
.await;
182+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
183+
184+
rig.verify_trace_expects(&trace, &responses);
185+
186+
let started = rig.tool_calls_started();
187+
assert!(
188+
started.contains(&"routine_create".to_string()),
189+
"routine_create missing"
190+
);
191+
assert!(
192+
started.contains(&"routine_history".to_string()),
193+
"routine_history missing"
194+
);
195+
196+
rig.shutdown();
197+
}
198+
199+
// -----------------------------------------------------------------------
200+
// Test 6: job_create_status
201+
// -----------------------------------------------------------------------
202+
// Uses {{call_cj_1.job_id}} template to forward the dynamic UUID from
203+
// create_job's result into job_status's arguments.
204+
205+
#[tokio::test]
206+
async fn job_create_status() {
207+
let trace = LlmTrace::from_file(concat!(
208+
env!("CARGO_MANIFEST_DIR"),
209+
"/tests/fixtures/llm_traces/tools/job_create_status.json"
210+
))
211+
.expect("failed to load job_create_status.json");
212+
213+
let rig = TestRigBuilder::new()
214+
.with_trace(trace.clone())
215+
.build()
216+
.await;
217+
218+
rig.send_message("Create a job and check its status").await;
219+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
220+
221+
rig.verify_trace_expects(&trace, &responses);
222+
223+
// Both tools should have succeeded.
224+
let completed = rig.tool_calls_completed();
225+
assert!(
226+
completed.iter().any(|(n, ok)| n == "create_job" && *ok),
227+
"create_job should succeed: {completed:?}"
228+
);
229+
assert!(
230+
completed.iter().any(|(n, ok)| n == "job_status" && *ok),
231+
"job_status should succeed: {completed:?}"
232+
);
233+
234+
// Verify tool results contain expected content.
235+
let results = rig.tool_results();
236+
let create_result = results
237+
.iter()
238+
.find(|(n, _)| n == "create_job")
239+
.expect("create_job result missing");
240+
assert!(
241+
create_result.1.contains("job_id"),
242+
"create_job should return a job_id: {:?}",
243+
create_result.1
244+
);
245+
let status_result = results
246+
.iter()
247+
.find(|(n, _)| n == "job_status")
248+
.expect("job_status result missing");
249+
assert!(
250+
status_result.1.contains("Test analysis job"),
251+
"job_status should return the job title: {:?}",
252+
status_result.1
253+
);
254+
255+
rig.shutdown();
256+
}
257+
258+
// -----------------------------------------------------------------------
259+
// Test 7: job_list_cancel
260+
// -----------------------------------------------------------------------
261+
// Uses {{call_cj_lc.job_id}} template to forward the dynamic UUID from
262+
// create_job into cancel_job.
263+
264+
#[tokio::test]
265+
async fn job_list_cancel() {
266+
let trace = LlmTrace::from_file(concat!(
267+
env!("CARGO_MANIFEST_DIR"),
268+
"/tests/fixtures/llm_traces/tools/job_list_cancel.json"
269+
))
270+
.expect("failed to load job_list_cancel.json");
271+
272+
let rig = TestRigBuilder::new()
273+
.with_trace(trace.clone())
274+
.build()
275+
.await;
276+
277+
rig.send_message("Create a job, list jobs, then cancel it")
278+
.await;
279+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
280+
281+
rig.verify_trace_expects(&trace, &responses);
282+
283+
// All three tools should have succeeded.
284+
let completed = rig.tool_calls_completed();
285+
assert!(
286+
completed.iter().any(|(n, ok)| n == "create_job" && *ok),
287+
"create_job should succeed: {completed:?}"
288+
);
289+
assert!(
290+
completed.iter().any(|(n, ok)| n == "list_jobs" && *ok),
291+
"list_jobs should succeed: {completed:?}"
292+
);
293+
assert!(
294+
completed.iter().any(|(n, ok)| n == "cancel_job" && *ok),
295+
"cancel_job should succeed: {completed:?}"
296+
);
297+
298+
rig.shutdown();
299+
}
300+
301+
// -----------------------------------------------------------------------
302+
// Test 8: http_get_with_replay
303+
// -----------------------------------------------------------------------
304+
305+
#[tokio::test]
306+
async fn http_get_with_replay() {
307+
let trace = LlmTrace::from_file(concat!(
308+
env!("CARGO_MANIFEST_DIR"),
309+
"/tests/fixtures/llm_traces/tools/http_get_replay.json"
310+
))
311+
.expect("failed to load http_get_replay.json");
312+
313+
let rig = TestRigBuilder::new()
314+
.with_trace(trace.clone())
315+
.build()
316+
.await;
317+
318+
rig.send_message("Make an http GET request").await;
319+
let responses = rig.wait_for_responses(1, Duration::from_secs(15)).await;
320+
321+
rig.verify_trace_expects(&trace, &responses);
322+
323+
// HTTP tool should have succeeded with the replayed exchange.
324+
let completed = rig.tool_calls_completed();
325+
assert!(
326+
completed.iter().any(|(n, ok)| n == "http" && *ok),
327+
"http tool should succeed: {completed:?}"
328+
);
329+
330+
rig.shutdown();
331+
}
332+
}

0 commit comments

Comments
 (0)