You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`final_response`| string or null | The agent's final response |
146
+
|`intermediate_steps`| object | Steps between user input and final response |
147
+
148
+
The `intermediate_steps` object contains:
149
+
150
+
| Field | Type | Description |
151
+
|---|---|---|
141
152
|`tool_calls`| array | Tools the agent called |
142
153
|`tool_responses`| array | Responses the agent received from tools |
143
154
@@ -159,6 +170,24 @@ Each invocation contains:
159
170
|`per_invocation_scores`| no | Per-turn scores (same order as input invocations) |
160
171
|`details`| no | Arbitrary metadata for debugging |
161
172
173
+
### Protocol Versioning
174
+
175
+
The `protocol_version` field uses `"MAJOR.MINOR"` format (currently `"1.0"`). This allows the CLI and SDK to evolve independently while maintaining compatibility:
176
+
177
+
-**Additive only** -- new fields may be added to `EvalInput` or `EvalResult`; existing fields are never removed or renamed within the same major version.
178
+
-**Defaults required** -- every new field must have a default value. Older deserializers silently ignore unknown fields (Pydantic's default behavior), so an evaluator built against an older SDK will still work with a newer CLI.
179
+
-**MINOR bumps** -- additive changes (new optional fields). No action required by evaluator authors.
180
+
-**MAJOR bumps** -- breaking changes (removed fields, type changes). The SDK's `@evaluator` decorator will log a warning if it sees a major version it does not recognize.
181
+
182
+
The CLI and SDK are **independent packages**. Install them at whatever versions you need:
183
+
184
+
```bash
185
+
pip install agentevals # CLI -- may speak protocol 1.1
186
+
pip install agentevals-evaluator-sdk # SDK -- may speak protocol 1.0
187
+
```
188
+
189
+
As long as the major version matches, they are compatible.
190
+
162
191
## Writing Evaluators in Other Languages
163
192
164
193
You don't need the Python SDK. Any program that reads JSON from stdin and writes JSON to stdout works.
0 commit comments