Skip to content

Commit 6d74274

Browse files
fix: default EvalHub adapter to stream=false for reliable tool scoring
Non-streaming responses include tool_invocations/tool_calls in the JSON body. Streaming relies on delta.tool_calls which not all agents emit (e.g. AutoGen uses a custom mcp.tool_usage SSE event). Defaulting to false ensures tool scorers work for all agents out of the box; jobs can still opt in to streaming via job parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 71169b5 commit 6d74274

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

evals/evalhub_adapter/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ class AgenticEvalParams:
9696
timeout_seconds: float = 30.0
9797
verify_ssl: bool = True
9898
fixtures_path: str = "fixtures"
99-
stream: bool = True
99+
stream: bool = False
100100

101101
# MLflow trace enrichment (reads tool calls from agent-side traces)
102102
mlflow_tracking_uri: str | None = None

0 commit comments

Comments
 (0)