Skip to content

feat: add TaskResultStore for caching and replaying task execution results#176

Merged
afarntrog merged 4 commits intostrands-agents:mainfrom
afarntrog:task_result_store
Mar 24, 2026
Merged

feat: add TaskResultStore for caching and replaying task execution results#176
afarntrog merged 4 commits intostrands-agents:mainfrom
afarntrog:task_result_store

Conversation

@afarntrog
Copy link
Contributor

Description

Add TaskResultStore protocol for caching task execution results across experiment runs. This enables skipping expensive task re-execution (e.g., LLM calls) when iterating on evaluators, by loading cached EvaluationData from a pluggable storage backend. In addition, refactor the multi hundred line _worker method.

  • Introduce TaskResultStore protocol with load/save methods that implementations can back with any storage (files, S3, databases, etc.)
  • Accept an optional task_result_store parameter in run_evaluations and run_evaluations_async
  • When a store is provided, load cached results before executing the task and save results after execution
  • Validate that all cases have unique, non-None names when a store is provided
  • Refactor _worker by extracting _execute_task and _run_evaluator into standalone methods for readability
  • Export TaskResultStore from the top-level strands_evals package

Related Issues

#94

Documentation PR

Type of Change

New feature

Testing

Added 5 unit tests in tests/strands_evals/test_experiment.py under a new TestTaskResultStore class covering:

  • Task execution and result saving on first run with an empty store
  • Cached result loading on subsequent runs (task not called)
  • Validation that cases must have names when a store is provided
  • Validation that case names must be unique when a store is provided
  • Default behavior without a store remains unchanged

Uses a DictTaskResultStore in-memory implementation for test isolation.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Introduce TaskResultStore abstraction to enable caching and reuse of
task results across experiment runs. Refactor Experiment class to
extract task execution logic into a dedicated  method
with retry, tracing, and optional result store integration. Add
case name validation to ensure uniqueness when using a result store.
Introduce TaskResultStore abstraction to enable caching and reuse of
task results across experiment runs. Refactor Experiment class to
extract task execution logic into a dedicated  method
with retry, tracing, and optional result store integration. Add
case name validation to ensure uniqueness when using a result store.
@afarntrog afarntrog changed the title Task result store feat: Add TaskResultStore for caching and replaying task execution results Mar 23, 2026
@afarntrog afarntrog changed the title feat: Add TaskResultStore for caching and replaying task execution results feat: add TaskResultStore for caching and replaying task execution results Mar 23, 2026
Copy link
Contributor

@poshinchen poshinchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the next step implementing the local store (save and load json)?

@afarntrog
Copy link
Contributor Author

Is the next step implementing the local store (save and load json)?

Yes

Rename the TaskResultStore class and module to EvaluationDataStore to
better reflect its purpose of storing evaluation data rather than just
task results. Updates all references including module file, class name,
parameter names, docstrings, error messages, and tests.
poshinchen
poshinchen previously approved these changes Mar 24, 2026
Rename the TaskResultStore class and module to EvaluationDataStore to
better reflect its purpose of storing evaluation data rather than just
task results. Updates all references including module file, class name,
parameter names, docstrings, error messages, and tests.
@afarntrog afarntrog merged commit dae3c55 into strands-agents:main Mar 24, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants