`ArgumentCorrectnessMetric` always fails to render: 'stringified_tools_called' is undefined

**Describe the bug**
`ArgumentCorrectnessMetric` is unusable in 4.0.7: every call raises a template render error before any LLM request is made. The `generate_verdicts` template references `{{ stringified_tools_called }}`, but the metric only passes `tools_called` to `_get_prompt`, and nothing in the codebase builds `stringified_tools_called`. Since templates render with strict-undefined, the render fails for every input. There is also no `evaluation_template` parameter on the metric, so there is no supported user-side workaround.

**To Reproduce**
Steps to reproduce the behavior:
1. `pip install deepeval==4.0.7`
2. Run the following (fails offline the error is raised at template render, before the API call, so no valid API key is needed):

```python
import asyncio
from deepeval.metrics import ArgumentCorrectnessMetric
from deepeval.test_case import LLMTestCase, ToolCall

metric = ArgumentCorrectnessMetric(model="gpt-4.1")
test_case = LLMTestCase(
  input="When did Trump first raise tariffs?",
  actual_output="Trump first raised tariffs in 2018.",
  tools_called=[ToolCall(name="WebSearch", input_parameters={"q": "Trump tariffs year"})],
)
asyncio.run(metric.a_measure(test_case))
```

3. See error:

```
deepeval.templates.resolver.MetricTemplateInterpolationError:
Missing variable during template render: 'stringified_tools_called' is undefined
```

**Expected behavior**
The metric renders its `generate_verdicts` prompt and returns a score

**Root cause**
- `deepeval/metrics/argument_correctness/argument_correctness.py`: `_a_generate_verdicts` / `_generate_verdicts` call `self._get_prompt("generate_verdicts", input=..., tools_called=tools_called, multimodal=...)`.
- The `generate_verdicts` template (`deepeval/metrics/argument_correctness/templates/generate_verdicts.txt` and the compiled `deepeval/templates/metrics/templates.json`) references `{{ stringified_tools_called }}`.
- `stringified_tools_called` is built nowhere (grep finds it only in the templates); `resolve_template` renders with strict-undefined, so the render raises.

**Suggested fix (either works)**
1. Template fix (minimal): change `generate_verdicts` to use `{{ tools_called }}`: it is already passed, and `ToolCall.__repr__` renders the formatted block the template's own example expects. Update both the `.txt` source and the compiled `templates.json`.
2. Code fix: build `stringified_tools_called` (e.g. `repr(tools_called)`) in `_a_generate_verdicts` / `_generate_verdicts` and pass it to `_get_prompt`.

Happy to open a PR.

**Desktop (please complete the following information):**
- deepeval version: 4.0.7
- Python version: 3.14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`ArgumentCorrectnessMetric` always fails to render: 'stringified_tools_called' is undefined #2817

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

ArgumentCorrectnessMetric always fails to render: 'stringified_tools_called' is undefined #2817

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`ArgumentCorrectnessMetric` always fails to render: 'stringified_tools_called' is undefined #2817