Skip to content

ArgumentCorrectnessMetric always fails to render: 'stringified_tools_called' is undefined #2817

Description

@safina57

Describe the bug
ArgumentCorrectnessMetric is unusable in 4.0.7: every call raises a template render error before any LLM request is made. The generate_verdicts template references {{ stringified_tools_called }}, but the metric only passes tools_called to _get_prompt, and nothing in the codebase builds stringified_tools_called. Since templates render with strict-undefined, the render fails for every input. There is also no evaluation_template parameter on the metric, so there is no supported user-side workaround.

To Reproduce
Steps to reproduce the behavior:

  1. pip install deepeval==4.0.7
  2. Run the following (fails offline the error is raised at template render, before the API call, so no valid API key is needed):
import asyncio
from deepeval.metrics import ArgumentCorrectnessMetric
from deepeval.test_case import LLMTestCase, ToolCall

metric = ArgumentCorrectnessMetric(model="gpt-4.1")
test_case = LLMTestCase(
  input="When did Trump first raise tariffs?",
  actual_output="Trump first raised tariffs in 2018.",
  tools_called=[ToolCall(name="WebSearch", input_parameters={"q": "Trump tariffs year"})],
)
asyncio.run(metric.a_measure(test_case))
  1. See error:
deepeval.templates.resolver.MetricTemplateInterpolationError:
Missing variable during template render: 'stringified_tools_called' is undefined

Expected behavior
The metric renders its generate_verdicts prompt and returns a score

Root cause

  • deepeval/metrics/argument_correctness/argument_correctness.py: _a_generate_verdicts / _generate_verdicts call self._get_prompt("generate_verdicts", input=..., tools_called=tools_called, multimodal=...).
  • The generate_verdicts template (deepeval/metrics/argument_correctness/templates/generate_verdicts.txt and the compiled deepeval/templates/metrics/templates.json) references {{ stringified_tools_called }}.
  • stringified_tools_called is built nowhere (grep finds it only in the templates); resolve_template renders with strict-undefined, so the render raises.

Suggested fix (either works)

  1. Template fix (minimal): change generate_verdicts to use {{ tools_called }}: it is already passed, and ToolCall.__repr__ renders the formatted block the template's own example expects. Update both the .txt source and the compiled templates.json.
  2. Code fix: build stringified_tools_called (e.g. repr(tools_called)) in _a_generate_verdicts / _generate_verdicts and pass it to _get_prompt.

Happy to open a PR.

Desktop (please complete the following information):

  • deepeval version: 4.0.7
  • Python version: 3.14

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions