feat(test_case): make trace_dict public for post-hoc agentic evaluation by tiffanychum · Pull Request #2600 · confident-ai/deepeval

tiffanychum · 2026-04-04T18:02:44Z

Summary

LLMTestCase._trace_dict → public trace_dict field — the private attribute made it impossible to pass a pre-recorded trace at construction time, so the four agentic trace metrics (TaskCompletionMetric, StepEfficiencyMetric, PlanQualityMetric, PlanAdherenceMetric) only worked with @observe at runtime.
Alias support — serialization_alias="traceDict" + validation_alias=AliasChoices("traceDict", "trace_dict") so both snake_case and camelCase work in model_validate and JSON round-trips (consistent with all other fields on LLMTestCase).
All internal usages updated — 6 assignment sites in evaluate/execute.py and all 4 metric files updated from ._trace_dict to .trace_dict; the runtime @observe path is unchanged.

Motivation

The non-trace evaluation path in task_completion.py is already marked:

# TODO: Deprecate this soon

But until now there was no way to reach the trace path without @observe at runtime. This PR closes that gap, enabling:

Offline / batch evaluation from saved logs or databases
CI trace replay without re-running the agent
Third-party pipelines where you can't decorate the agent code
Post-mortem analysis of production traces

Changes

File	What changed
`deepeval/test_case/llm_test_case.py`	`_trace_dict: PrivateAttr` → `trace_dict: Field(...)` with aliases
`deepeval/evaluate/execute.py`	6 internal assignments updated to public field
`deepeval/metrics/task_completion/task_completion.py`	`_trace_dict` → `trace_dict` (4 references)
`deepeval/metrics/step_efficiency/step_efficiency.py`	`_trace_dict` → `trace_dict` (6 references)
`deepeval/metrics/plan_quality/plan_quality.py`	`_trace_dict` → `trace_dict` (4 references)
`deepeval/metrics/plan_adherence/plan_adherence.py`	`_trace_dict` → `trace_dict` (8 references)
`tests/test_core/test_test_case/test_single_turn.py`	Updated assertions + new `test_trace_dict_constructor_and_alias` test
`examples/tracing/test_posthoc_evaluation.py`	New example showing post-hoc evaluation from a saved trace

Test plan

python -m pytest tests/test_core/test_test_case/test_single_turn.py -v — all existing tests pass, new alias test passes
python examples/tracing/test_posthoc_evaluation.py — post-hoc evaluation runs end-to-end with a pre-recorded trace dict
Existing @observe runtime path: no behaviour change (assignments in execute.py write to the same field)

…evaluation `LLMTestCase._trace_dict` was a `PrivateAttr`, making it impossible to pass a pre-recorded trace at construction time. This meant the four agentic trace metrics (TaskCompletion, StepEfficiency, PlanQuality, PlanAdherence) could only be used with `@observe` at runtime — not from saved logs, CI replay, or third-party pipelines. Changes: - `LLMTestCase._trace_dict` → public `trace_dict` field (`Field(...)`) with `serialization_alias="traceDict"` and `validation_alias=AliasChoices("traceDict", "trace_dict")` so both snake_case and camelCase work in `model_validate` / JSON round-trips. - All internal assignments in `evaluate/execute.py` (6 sites) updated from `._trace_dict =` to `.trace_dict =` — runtime @observe path is unchanged. - All four agentic metrics updated from `test_case._trace_dict` to `test_case.trace_dict`. - Unit tests in `test_single_turn.py` updated and a new `test_trace_dict_constructor_and_alias` test added. - New example added at `examples/tracing/test_posthoc_evaluation.py` showing post-hoc evaluation from a pre-recorded trace dict. Fixes the post-hoc evaluation gap noted in the TODO comments in task_completion.py: the non-trace path is already marked `# TODO: Deprecate this soon`; this PR makes the trace path accessible without runtime instrumentation.

vercel · 2026-04-04T18:02:51Z

@tiffanychum is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(test_case): make trace_dict public for post-hoc agentic evaluation#2600

feat(test_case): make trace_dict public for post-hoc agentic evaluation#2600
tiffanychum wants to merge 1 commit intoconfident-ai:mainfrom
tiffanychum:feat/public-trace-dict-for-posthoc-eval

tiffanychum commented Apr 4, 2026

Uh oh!

vercel Bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tiffanychum commented Apr 4, 2026

Summary

Motivation

Changes

Test plan

Related

Uh oh!

vercel Bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant