Describe the bug
ArgumentCorrectnessMetric is unusable in 4.0.7: every call raises a template render error before any LLM request is made. The generate_verdicts template references {{ stringified_tools_called }}, but the metric only passes tools_called to _get_prompt, and nothing in the codebase builds stringified_tools_called. Since templates render with strict-undefined, the render fails for every input. There is also no evaluation_template parameter on the metric, so there is no supported user-side workaround.
To Reproduce
Steps to reproduce the behavior:
pip install deepeval==4.0.7
- Run the following (fails offline the error is raised at template render, before the API call, so no valid API key is needed):
import asyncio
from deepeval.metrics import ArgumentCorrectnessMetric
from deepeval.test_case import LLMTestCase, ToolCall
metric = ArgumentCorrectnessMetric(model="gpt-4.1")
test_case = LLMTestCase(
input="When did Trump first raise tariffs?",
actual_output="Trump first raised tariffs in 2018.",
tools_called=[ToolCall(name="WebSearch", input_parameters={"q": "Trump tariffs year"})],
)
asyncio.run(metric.a_measure(test_case))
- See error:
deepeval.templates.resolver.MetricTemplateInterpolationError:
Missing variable during template render: 'stringified_tools_called' is undefined
Expected behavior
The metric renders its generate_verdicts prompt and returns a score
Root cause
deepeval/metrics/argument_correctness/argument_correctness.py: _a_generate_verdicts / _generate_verdicts call self._get_prompt("generate_verdicts", input=..., tools_called=tools_called, multimodal=...).
- The
generate_verdicts template (deepeval/metrics/argument_correctness/templates/generate_verdicts.txt and the compiled deepeval/templates/metrics/templates.json) references {{ stringified_tools_called }}.
stringified_tools_called is built nowhere (grep finds it only in the templates); resolve_template renders with strict-undefined, so the render raises.
Suggested fix (either works)
- Template fix (minimal): change
generate_verdicts to use {{ tools_called }}: it is already passed, and ToolCall.__repr__ renders the formatted block the template's own example expects. Update both the .txt source and the compiled templates.json.
- Code fix: build
stringified_tools_called (e.g. repr(tools_called)) in _a_generate_verdicts / _generate_verdicts and pass it to _get_prompt.
Happy to open a PR.
Desktop (please complete the following information):
- deepeval version: 4.0.7
- Python version: 3.14
Describe the bug
ArgumentCorrectnessMetricis unusable in 4.0.7: every call raises a template render error before any LLM request is made. Thegenerate_verdictstemplate references{{ stringified_tools_called }}, but the metric only passestools_calledto_get_prompt, and nothing in the codebase buildsstringified_tools_called. Since templates render with strict-undefined, the render fails for every input. There is also noevaluation_templateparameter on the metric, so there is no supported user-side workaround.To Reproduce
Steps to reproduce the behavior:
pip install deepeval==4.0.7Expected behavior
The metric renders its
generate_verdictsprompt and returns a scoreRoot cause
deepeval/metrics/argument_correctness/argument_correctness.py:_a_generate_verdicts/_generate_verdictscallself._get_prompt("generate_verdicts", input=..., tools_called=tools_called, multimodal=...).generate_verdictstemplate (deepeval/metrics/argument_correctness/templates/generate_verdicts.txtand the compileddeepeval/templates/metrics/templates.json) references{{ stringified_tools_called }}.stringified_tools_calledis built nowhere (grep finds it only in the templates);resolve_templaterenders with strict-undefined, so the render raises.Suggested fix (either works)
generate_verdictsto use{{ tools_called }}: it is already passed, andToolCall.__repr__renders the formatted block the template's own example expects. Update both the.txtsource and the compiledtemplates.json.stringified_tools_called(e.g.repr(tools_called)) in_a_generate_verdicts/_generate_verdictsand pass it to_get_prompt.Happy to open a PR.
Desktop (please complete the following information):