confident-ai / deepeval Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 16.1k

Code
Issues 210
Pull requests 78
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: confident-ai/deepeval

Labels 17 Milestones 1

New pull request New

78 Open 1,955 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add single-turn, multi-turn, arena and multimodal metrics to typescript

#2745 opened Jun 10, 2026 by A-Vamshi Collaborator

Loading…

feat(contextual-precision): add RetrievedContextData source grouping and fix weighted precision score

#2743 opened Jun 10, 2026 by Ruthwik-Data

Loading…

feat: inspect and bound GEval retrieval context prompts

#2742 opened Jun 10, 2026 by RitwijParmar Contributor

Loading…

fix(metrics): reject list params containing only empty strings (#2248)

#2739 opened Jun 9, 2026 by kadiryonak

Loading…

[codex] Add source grouping parity for contextual precision

#2737 opened Jun 9, 2026 by cat0825

Loading…

fix: add deepseek-v4 models, fix calculate_cost, and improve error ha…

#2736 opened Jun 9, 2026 by Comui520

Loading…

docs: add CometAPI custom provider example for GPTModel

#2733 opened Jun 8, 2026 by nuthalapativarun Contributor

Loading…

Add regression testing on CI/CD

#2731 opened Jun 8, 2026 by A-Vamshi Collaborator

Loading…

[codex] Add contextual precision grouping

#2729 opened Jun 8, 2026 by cat0825

Loading…

[codex] Fix flaky retry test run dedupe

#2728 opened Jun 8, 2026 by cat0825 • Draft

feat(examples): add TWZRD Agent Intel MCP server evaluation example

#2727 opened Jun 6, 2026 by twzrd-sol

Loading…

fix(dataset): assign each golden its index as _dataset_rank, not the dataset size

#2726 opened Jun 5, 2026 by bymle

Loading…

Add BGPT REFUTE benchmark

#2725 opened Jun 5, 2026 by connerlambden

Loading…

Fix gemini costs

#2724 opened Jun 5, 2026 by A-Vamshi Collaborator

Loading…

Fix LiteLLM cost calculation (was 666× over for common models)

#2723 opened Jun 4, 2026 by Aarkin7 Contributor

Loading…

Add optional juryeval integration for LLM-as-Judge metrics

#2715 opened May 31, 2026 by py-ai-dev

Loading…

fix: loosen opentelemetry version constraint to allow 1.x and 2.x

#2712 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

feat: add return_details param to compare() for per-test-case verdicts

#2711 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

fix: use correct confinement instructions for ARC questions with numeric labels

#2710 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

fix: restore deepeval.evaluate subpackage binding shadowed by _expose_public_api

#2706 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

4 tasks done

feat: expose score_breakdown in StepEfficiencyMetric

#2705 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

3 tasks

feat: add penalize_ambiguous_claims to HallucinationMetric

#2704 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

3 tasks

feat: add o3 and o3-2025-04-16 model constants

#2702 opened May 29, 2026 by nuthalapativarun Contributor

Loading…

2 tasks

test(metrics): add overlapping-chunk test fixtures for ContextualPrecisionMetric

#2692 opened May 26, 2026 by Ruthwik-Data

Loading…

Fix misuse, MCP use, and tool correctness docs

#2691 opened May 25, 2026 by krishna-dhulipalla

Loading…

Previous 1 2 3 4 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!