fix(async,ci): prevent coroutine re-await & stabilize async evals by BloggerBust · Pull Request #2068 · confident-ai/deepeval

BloggerBust · 2025-09-18T21:08:09Z

Track and gather only tasks created on the current loop during evals to avoid “cannot reuse already awaited coroutine” and cross loop errors.
Introduce coerce_to_task() to normalize Task/Future/coroutine/awaitable inputs.
Refactor EvaluationTasks to store asyncio Futures, return copies from get_tasks(), and provide a clear_tasks() that cancels pending tasks before clearing.
call loop.shutdown_asyncgens() in tracing worker before close.
Replace blocking time.sleep() with await asyncio.sleep() in async test helper.
CI: add maintainer only full test workflow with secrets gating & concurrency
Ruff cleanups

vercel · 2025-09-18T21:08:13Z

@BloggerBust is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

- Track and gather only tasks created on the current loop during evals to avoid “cannot reuse already awaited coroutine” and cross loop errors. - Introduce coerce_to_task() to normalize Task/Future/coroutine/awaitable inputs. - Refactor EvaluationTasks to store asyncio Futures, return copies from get_tasks(), and provide a clear_tasks() that cancels pending tasks before clearing. - call loop.shutdown_asyncgens() in tracing worker before close. - Replace blocking time.sleep() with await asyncio.sleep() in async test helper. - CI: add maintainer only full test workflow with secrets gating & concurrency - Ruff cleanups

- Implemented logging to track per task and gather timeouts separately. - Added task metadata collection to help debug stalled tasks. - Enhanced logging with suggestions for increasing timeout values. - Introduced `DEEPEVAL_PER_TASK_TIMEOUT_SECONDS` and `DEEPEVAL_TASK_GATHER_BUFFER_SECONDS` for configuration of task and gather timeouts.

Add _snapshot_tasks() and use it to: - Take a baseline snapshot of tasks on the current loop before yielding goldens. - After gather, detect leftover tasks not in the baseline or created_tasks, then log, cancel, and drain them (when DEEPEVAL_DEBUG_ASYNC is enabled). Also remove the redundant `asyncio.create_task = loop.create_task` assignment before trace evaluation as we already restore the original create_task earlier. This reduces hangs caused by tasks leaking across iterations/loops and ensures only tasks created in the current run are awaited.

We now track and gather tasks per loop. The global cache is no longer read and is not needed. The cache posed cross loop task management risks, which is why we have moved away from it. - Delete `global_evaluation_tasks` singleton and related imports. - Stop appending to the global cache in `EvaluationDataset.evaluate()`; still coerce the task to ensure validation/side effects run. - Remove finalizers that cleared the global cache in `execute.py` looped paths.

This test depends on secrets. It is still be run using the maintainer only workflow.

Add test_synthesizer/test_context_generator.py to ingore list when running dev tests without secrets. This test suite requires secrets

Add test_synthesizer/test_conversation_simulator.py to ignore list when running dev tests without secrets

…ent tests - Ungate stray task cancel and drain cleanup in agentic iterator. - Keep detailed logs behind DEEPEVAL_DEBUG_ASYNC. - Add ignores for tests requiring OpenAI secrets.

- Add ignores for tests requiring OpenAI secrets.

Add ignores for tests requiring OpenAI secrets.

trevor-cai marked this pull request as draft September 18, 2025 23:35

BloggerBust force-pushed the fix/async-gather-reuse-hang branch 2 times, most recently from 040cce4 to 341c268 Compare September 19, 2025 23:19

trevor-cai marked this pull request as ready for review September 20, 2025 03:08

BloggerBust added 12 commits September 21, 2025 11:08

test: temporarily commented code to verify CI test behavior

f408e69

test(CI): put back finally cleanup block

90acbbe

ci(test_core): ignore test_evaluation/test_end_to_end/test_configs.py

ba46441

This test depends on secrets. It is still be run using the maintainer only workflow.

ci(test_core): ignore test_synthesizer/test_context_generator.py

3259847

Add test_synthesizer/test_context_generator.py to ingore list when running dev tests without secrets. This test suite requires secrets

ci(test_core): ignore test_synthesizer/test_conversation_simulator.py

ecce96e

Add test_synthesizer/test_conversation_simulator.py to ignore list when running dev tests without secrets

fix(async,ci): ungait agentic iterator cleanup and skip secret-depend…

120eec5

…ent tests - Ungate stray task cancel and drain cleanup in agentic iterator. - Keep detailed logs behind DEEPEVAL_DEBUG_ASYNC. - Add ignores for tests requiring OpenAI secrets.

fix(async,ci): skip secret-dependent tests

3e6e0fb

- Add ignores for tests requiring OpenAI secrets.

fix(async,ci): skip secret-dependent tests

96e1e10

Add ignores for tests requiring OpenAI secrets.

BloggerBust force-pushed the fix/async-gather-reuse-hang branch from e3db170 to 96e1e10 Compare September 21, 2025 17:08

penguine-ip merged commit 8152c6e into confident-ai:main Sep 22, 2025
2 of 4 checks passed

trevor-cai mentioned this pull request Oct 2, 2025

timeout mechanism #984

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(async,ci): prevent coroutine re-await & stabilize async evals#2068

fix(async,ci): prevent coroutine re-await & stabilize async evals#2068
penguine-ip merged 12 commits intoconfident-ai:mainfrom
BloggerBust:fix/async-gather-reuse-hang

BloggerBust commented Sep 18, 2025

Uh oh!

vercel Bot commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BloggerBust commented Sep 18, 2025

Uh oh!

vercel Bot commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants