Issue #1611 Fixes by PradyMagal · Pull Request #1638 · confident-ai/deepeval

PradyMagal · 2025-05-28T23:41:47Z

Fix TruthfulQA compatibility with AnthropicModel

Issue

TruthfulQA benchmark was incompatible with AnthropicModel, always failing with:
ValueError: Evaluation LLM outputted an invalid JSON. Please use a better evaluation model.

Root Cause

TruthfulQA attempted structured output (JSON schemas) with AnthropicModel
AnthropicModel responded with natural language explanations instead of JSON
JSON parsing failed but only TypeError was caught, not ValueError
Benchmark crashed instead of falling back to text-based prompting

Line 175 (predict method):

# Before
except TypeError:

# After  
except (TypeError, ValueError, AttributeError):

Line 235 (batch_predict method):

# Before
except TypeError:

# After
except (TypeError, ValueError, AttributeError):

Result

✅ TruthfulQA now works with AnthropicModel
✅ Maintains backward compatibility with existing models
✅ Graceful fallback to text-based prompting when structured output fails
✅ More robust error handling for any model without structured output support

Testing

Confirmed working with:

from deepeval.benchmarks import TruthfulQA
from deepeval.benchmarks.tasks import TruthfulQATask
from deepeval.benchmarks.modes import TruthfulQAMode
from deepeval.models import AnthropicModel

benchmark = TruthfulQA(
    tasks=[TruthfulQATask.ADVERTISING, TruthfulQATask.FICTION],
    mode=TruthfulQAMode.MC2
)
benchmark.evaluate(model=AnthropicModel())

Output:
Filter: 100%|████████████████████████████████████████████████████████████████████████████| 817/817 [00:00<00:00, 24678.95 examples/s] Processing Fiction: 100%|████████████████████████████████████████████████████████████████████████████| 30/30 [05:04<00:00, 10.14s/it] TruthfulQA Task Accuracy (task=Fiction): 82.43333333333334 Overall TruthfulQA Accuracy: 82.43333333333334

Got it working, replicated code from issue and it runs well. Needed to account for ValueError which catches the JSON Parsing issues

vercel · 2025-05-28T23:41:50Z

@PradyMagal is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

penguine-ip · 2025-05-29T02:18:51Z

hey @PradyMagal thanks! But can you please undo poetry lock and all the adds to the toml file? We just spent a whole week cutting down deepeval's package size in prod

PradyMagal · 2025-05-29T17:03:37Z

Poetry Lock has been undone, same with the additions to the toml file

Let me know if there's anything I missed

penguine-ip · 2025-05-29T17:16:22Z

@PradyMagal perfect, thanks!

Issue confident-ai#1611 Fixes

9b9fbd9

Got it working, replicated code from issue and it runs well. Needed to account for ValueError which catches the JSON Parsing issues

Remove pandas and datasets dependencies to reduce package size

565abf5

penguine-ip merged commit d4876b8 into confident-ai:main May 29, 2025
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #1611 Fixes#1638

Issue #1611 Fixes#1638
penguine-ip merged 2 commits intoconfident-ai:mainfrom
PradyMagal:main

PradyMagal commented May 28, 2025

Uh oh!

vercel Bot commented May 28, 2025

Uh oh!

penguine-ip commented May 29, 2025

Uh oh!

PradyMagal commented May 29, 2025

Uh oh!

penguine-ip commented May 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PradyMagal commented May 28, 2025

Fix TruthfulQA compatibility with AnthropicModel

Issue

Root Cause

Result

Testing

Uh oh!

vercel Bot commented May 28, 2025

Uh oh!

penguine-ip commented May 29, 2025

Uh oh!

PradyMagal commented May 29, 2025

Uh oh!

penguine-ip commented May 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants