You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for goldens in dataset.evals_iterator(metrics=[task_completion]):
61
+
trip_planner_agent(golden.input)
64
62
```
65
63
66
64
There are **SEVEN** optional parameters when creating a `TaskCompletionMetric`:
@@ -73,66 +71,7 @@ There are **SEVEN** optional parameters when creating a `TaskCompletionMetric`:
73
71
-[Optional]`async_mode`: a boolean which when set to `True`, enables [concurrent execution within the `measure()` method.](/docs/metrics-introduction#measuring-a-metric-in-async) Defaulted to `True`.
74
72
-[Optional]`verbose_mode`: a boolean which when set to `True`, prints the intermediate steps used to calculate said metric to the console, as outlined in the [How Is It Calculated](#how-is-it-calculated) section. Defaulted to `False`.
75
73
76
-
### End-to-End
77
-
78
-
You can also run the `TaskCompletionMetric` for [end-to-end](/docs/evaluation-end-to-end-llm-evals) evaluation as a standalone, though this is not the recommended way to use it, since the full trace is required for thorough evaluation.
79
-
80
-
```python
81
-
from deepeval import evaluate
82
-
from deepeval.test_case import LLMTestCase
83
-
from deepeval.metrics import TaskCompletionMetric
84
-
from deepeval.test_case import ToolCall
85
-
86
-
metric = TaskCompletionMetric(
87
-
threshold=0.7,
88
-
model="gpt-4.1",
89
-
include_reason=True
90
-
)
91
-
test_case = LLMTestCase(
92
-
input="Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine.",
93
-
actual_output=(
94
-
"Day 1: Eiffel Tower, dinner at Le Jules Verne. "
95
-
"Day 2: Louvre Museum, lunch at Angelina Paris. "
96
-
"Day 3: Montmartre, evening at a wine bar."
97
-
),
98
-
tools_called=[
99
-
ToolCall(
100
-
name="Itinerary Generator",
101
-
description="Creates travel plans based on destination and duration.",
To use the `TaskCompletionMetric` for end-to-end evaluation, you'll have to provide the following arguments when creating an [`LLMTestCase`](/docs/evaluation-test-cases#llm-test-case):
126
-
127
-
-`input`
128
-
-`actual_output`
129
-
-`tools_called`
130
-
131
-
Read the [How Is It Calculated](#how-is-it-calculated) section below to learn how test case parameters are used for metric calculation.
132
-
133
-
:::caution
134
-
This is not recommended and will be deprecated soon as a test case does not represent the full execution of an agent.
135
-
:::
74
+
To learn more about how the `evals_iterator` work, [click here.](/docs/evaluation-end-to-end-llm-evals#e2e-evals-for-tracing)
0 commit comments