Skip to content

Fix TypeError: unhashable type in ToolCorrectnessMetric for unhashable tool outputs (#2815)#2822

Open
HarperZ9 wants to merge 1 commit into
confident-ai:mainfrom
HarperZ9:fix/toolcall-unhashable-2815
Open

Fix TypeError: unhashable type in ToolCorrectnessMetric for unhashable tool outputs (#2815)#2822
HarperZ9 wants to merge 1 commit into
confident-ai:mainfrom
HarperZ9:fix/toolcall-unhashable-2815

Conversation

@HarperZ9

Copy link
Copy Markdown

Summary

Fixes #2815. ToolCorrectnessMetric raises TypeError: unhashable type during component-level evaluation (evals_iterator()) when a tool call's output, or a value nested in its input_parameters, is an arbitrary unhashable object, e.g. a LangChain ToolMessage or a pydantic model that defines __eq__ without __hash__.

Root cause

ToolCorrectnessMetric._calculate_non_exact_match_score puts ToolCall instances into a set, which calls ToolCall.__hash__. That routes input_parameters and output through _make_hashable (deepeval/test_case/llm_test_case.py). Its final branch returned any non-collection object unchanged, assuming it was a primitive hashable type:

else:
    # For primitive hashable types (str, int, float, bool, etc.)
    return obj

When obj is unhashable, hash((self.name, input_params_hashable, output_hashable)) then raises TypeError.

Fix

Fall back to a stable repr() for any value that is not hashable. Primitives and already-hashable objects are returned unchanged, so existing behavior is preserved; only the previously-crashing path changes.

else:
    try:
        hash(obj)
    except TypeError:
        return repr(obj)
    return obj

Reproduction (before the fix)

from deepeval.test_case import ToolCall

class ToolMessage:          # any object that is unhashable
    __hash__ = None

hash(ToolCall(name="t", output=ToolMessage()))           # TypeError: unhashable type
hash(ToolCall(name="t", input_parameters={"x": ToolMessage()}))  # TypeError

Both return an int after the fix and the tool calls can be placed in a set.

Tests

Adds test_tool_call_hashing_with_unhashable_types in tests/test_core/test_test_case/test_single_turn.py, covering an unhashable object both as output and nested in input_parameters. No API key required.

$ pytest tests/test_core/test_test_case/test_single_turn.py -q
37 passed

black --check and the existing hashing tests are clean.


Disclosure: authored with AI assistance (Claude), reviewed and verified before submitting.

…nfident-ai#2815)

ToolCorrectnessMetric places ToolCall instances into a set during
component-level evaluation. ToolCall.__hash__ routes input_parameters and
output through _make_hashable, whose final branch returned arbitrary objects
unchanged, assuming they were primitive hashable types. When the output (or a
value nested in input_parameters) is an unhashable object, such as a LangChain
ToolMessage or a pydantic model that defines __eq__ without __hash__, hashing
raised `TypeError: unhashable type`, breaking evals_iterator() on agent traces.

Fall back to a stable repr() for any value that is not hashable. Primitives and
already-hashable objects are returned unchanged. Adds a regression test
covering an unhashable object both as the output and nested in input_parameters.

Closes confident-ai#2815

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 30, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@HarperZ9

Copy link
Copy Markdown
Author

A quick note on the red checks, so the CI status is not misleading:

  • test and all integration jobs (langchain, openai, crewai, llama-index, agentcore, strands, pydantic-ai, openai-agents) pass. The new regression test runs under test.
  • lint (black) fails on pre-existing drift in main, not on this PR. The job reports "47 files would be reformatted"; none of them are the two files this PR touches (deepeval/test_case/llm_test_case.py and tests/test_core/test_test_case/test_single_turn.py), both of which black leaves unchanged. The black.yml workflow has been failing on main since around June 10, independent of this change. I left the 47 unrelated files alone to keep this PR scoped to the fix.
  • Vercel is the fork-deploy auth check, not a code check.

Happy to rebase or adjust if you handle the black drift separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ToolCorrectnessMetric gives an error when used in Component level eval using evals_iterator()

1 participant