feat: add ActorSimulator for multi-turn conversation evaluation by jjbuck · Pull Request #28 · strands-agents/evals

jjbuck · 2025-11-06T16:58:17Z

Description

Introduces ActorSimulator framework for simulating realistic actors (typically users) in multi-turn conversations with agents under test. Enables systematic evaluation of conversational agents through synthetic user interactions.

Key capabilities:

Generic ActorSimulator class configurable with arbitrary system prompts
from_case_for_user_simulator() factory method to condition simulator to act as a user on the basis of a given Case
Automatic profile generation from test cases using LLM inference
Built-in goal completion assessment tool for conversation evaluation
Support for custom tools and behaviors

Design principles:

Generic base class (ActorSimulator) with specialized factory methods
Clear separation: init() for generic construction, factory for specialization
Optional task_description in Case metadata (handles vague initial queries)

Related Issues

N/A

Documentation PR

N/A

Type of Change

New feature

Testing

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

src/strands_evals/types/simulation/actor.py

src/strands_evals/simulation/profiles/__init__.py

src/strands_evals/simulation/actor_simulator.py

src/strands_evals/simulation/README.md

src/strands_evals/simulation/actor_simulator.py

src/strands_evals/simulation/tools/goal_completion.py

ntroduces ActorSimulator framework for simulating realistic actors (typically users) in multi-turn conversations with agents under test. Enables systematic evaluation of conversational agents through synthetic user interactions. Key capabilities: 1. Generic ActorSimulator class configurable with arbitrary system prompts 2. from_case_for_user_simulator() factory method to condition simulator to act as a user on the basis of a given Case 3. Automatic profile generation from test cases using LLM inference 4. Built-in goal completion assessment tool for conversation evaluation 5. Support for custom tools and behaviors Design principles: 1. Generic base class (ActorSimulator) with specialized factory methods 2. Clear separation: init() for generic construction, factory for specialization 3. Optional task_description in Case metadata (handles vague initial queries)

jjbuck temporarily deployed to auto-approve November 6, 2025 16:58 — with GitHub Actions Inactive

jjbuck force-pushed the feature/simulator branch from 250cc50 to 25f1357 Compare November 6, 2025 16:59

jjbuck requested a review from poshinchen November 6, 2025 16:59

jjbuck temporarily deployed to auto-approve November 6, 2025 17:00 — with GitHub Actions Inactive

jjbuck force-pushed the feature/simulator branch from 25f1357 to 70ddd83 Compare November 6, 2025 17:18

jjbuck temporarily deployed to auto-approve November 6, 2025 17:18 — with GitHub Actions Inactive

poshinchen reviewed Nov 7, 2025

View reviewed changes

src/strands_evals/types/simulation/actor.py Show resolved Hide resolved

poshinchen reviewed Nov 7, 2025

View reviewed changes

src/strands_evals/simulation/profiles/__init__.py Show resolved Hide resolved

poshinchen reviewed Nov 7, 2025

View reviewed changes

src/strands_evals/simulation/actor_simulator.py Show resolved Hide resolved

jjbuck force-pushed the feature/simulator branch from 70ddd83 to 36de188 Compare November 11, 2025 00:10

jjbuck temporarily deployed to auto-approve November 11, 2025 00:10 — with GitHub Actions Inactive

jjbuck requested a review from poshinchen November 11, 2025 00:10

jjbuck force-pushed the feature/simulator branch from 36de188 to 01592ab Compare November 11, 2025 04:07

jjbuck temporarily deployed to auto-approve November 11, 2025 04:07 — with GitHub Actions Inactive