feat: Add FaithfulnessEvaluator by jjbuck · Pull Request #27 · strands-agents/evals

jjbuck · 2025-11-04T17:20:22Z

Description

Add FaithfulnessEvaluator to assess whether agent responses conflict with conversation history
Implement 5-level scoring system (Not At All, Not Generally, Neutral/Mixed, Generally Yes, Completely Yes
Add faithfulness prompt templates (v0) with evaluation guidelines
Include example usage and comprehensive unit tests
Update naming of low-level data types (turn -> trace, conversation -> session) to align better with OTEL conventions.

Related Issues

feat: add GoalSuccessRateEvaluator for conversation goal tracking #22 (goal success rate)
Implement trace-based evaluation with helpfulness evaluator. #16 (helpfulness)

Documentation PR

N/A

Type of Change

New feature ("Faithfulness Evaulator")
Refactor Update naming of low-level data types (turn -> trace, conversation -> session) to align better with OTEL conventions.

Testing

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jjbuck · 2025-11-05T22:13:01Z

@poshinchen just rebased on the tip of main (which now includes #25).

src/strands_evals/evaluators/helpfulness_evaluator.py

src/strands_evals/evaluators/faithfulness_evaluator.py

- Add FaithfulnessEvaluator to assess whether agent responses conflict with conversation history - Implement 5-level scoring system (Not At All, Not Generally, Neutral/Mixed, Generally Yes, Completely Yes) - Add faithfulness prompt templates (v0) with evaluation guidelines - Include example usage and comprehensive unit tests - Update low-level data types (turn -> trace, conversation -> session)

jjbuck · 2025-11-06T19:01:19Z

@poshinchen addressed your comments above in the latest edits.

jjbuck temporarily deployed to auto-approve November 4, 2025 17:20 — with GitHub Actions Inactive

jjbuck requested a review from poshinchen November 4, 2025 17:22

jjbuck force-pushed the feature/faithfulness_evaluator branch from 91d6b18 to db136bf Compare November 5, 2025 22:12

jjbuck temporarily deployed to auto-approve November 5, 2025 22:12 — with GitHub Actions Inactive

poshinchen reviewed Nov 6, 2025

View reviewed changes

src/strands_evals/evaluators/helpfulness_evaluator.py Show resolved Hide resolved

poshinchen reviewed Nov 6, 2025

View reviewed changes

src/strands_evals/evaluators/helpfulness_evaluator.py Outdated Show resolved Hide resolved

poshinchen reviewed Nov 6, 2025

View reviewed changes

src/strands_evals/evaluators/faithfulness_evaluator.py Show resolved Hide resolved

poshinchen reviewed Nov 6, 2025

View reviewed changes

src/strands_evals/evaluators/faithfulness_evaluator.py Show resolved Hide resolved

jjbuck force-pushed the feature/faithfulness_evaluator branch from db136bf to d358a12 Compare November 6, 2025 19:00

jjbuck requested a review from poshinchen November 6, 2025 19:01

jjbuck temporarily deployed to auto-approve November 6, 2025 19:01 — with GitHub Actions Inactive

poshinchen approved these changes Nov 6, 2025

View reviewed changes

jjbuck merged commit 3cac0f0 into strands-agents:main Nov 6, 2025
12 checks passed

This was referenced Nov 7, 2025

feat: ToolSelectionAccuracyEvaluator #29

Merged

feat: ToolParameterAccuracyEvaluator #34

Merged

smeetd159 mentioned this pull request Nov 18, 2025

feat: Add Harmfulness EValuator #45

Closed

7 tasks

smeetd159 mentioned this pull request Dec 2, 2025

feat: Add Harmfulness Evaluator #57

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add FaithfulnessEvaluator#27

feat: Add FaithfulnessEvaluator#27
jjbuck merged 1 commit intostrands-agents:mainfrom
jjbuck:feature/faithfulness_evaluator

jjbuck commented Nov 4, 2025

Uh oh!

jjbuck commented Nov 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjbuck commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jjbuck commented Nov 4, 2025

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

jjbuck commented Nov 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjbuck commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants