Skip to content

fix(async,ci): prevent coroutine re-await & stabilize async evals#2068

Merged
penguine-ip merged 12 commits intoconfident-ai:mainfrom
BloggerBust:fix/async-gather-reuse-hang
Sep 22, 2025
Merged

fix(async,ci): prevent coroutine re-await & stabilize async evals#2068
penguine-ip merged 12 commits intoconfident-ai:mainfrom
BloggerBust:fix/async-gather-reuse-hang

Conversation

@BloggerBust
Copy link
Copy Markdown
Contributor

  • Track and gather only tasks created on the current loop during evals to avoid “cannot reuse already awaited coroutine” and cross loop errors.
  • Introduce coerce_to_task() to normalize Task/Future/coroutine/awaitable inputs.
  • Refactor EvaluationTasks to store asyncio Futures, return copies from get_tasks(), and provide a clear_tasks() that cancels pending tasks before clearing.
  • call loop.shutdown_asyncgens() in tracing worker before close.
  • Replace blocking time.sleep() with await asyncio.sleep() in async test helper.
  • CI: add maintainer only full test workflow with secrets gating & concurrency
  • Ruff cleanups

@vercel
Copy link
Copy Markdown

vercel Bot commented Sep 18, 2025

@BloggerBust is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@trevor-cai trevor-cai marked this pull request as draft September 18, 2025 23:35
@BloggerBust BloggerBust force-pushed the fix/async-gather-reuse-hang branch 2 times, most recently from 040cce4 to 341c268 Compare September 19, 2025 23:19
@trevor-cai trevor-cai marked this pull request as ready for review September 20, 2025 03:08
- Track and gather only tasks created on the current loop during evals to avoid
  “cannot reuse already awaited coroutine” and cross loop errors.
- Introduce coerce_to_task() to normalize Task/Future/coroutine/awaitable inputs.
- Refactor EvaluationTasks to store asyncio Futures, return copies from get_tasks(),
  and provide a clear_tasks() that cancels pending tasks before clearing.
- call loop.shutdown_asyncgens() in tracing worker before close.
- Replace blocking time.sleep() with await asyncio.sleep() in async test helper.
- CI: add maintainer only full test workflow with secrets gating & concurrency
- Ruff cleanups
- Implemented logging to track per task and gather timeouts separately.
- Added task metadata collection to help debug stalled tasks.
- Enhanced logging with suggestions for increasing timeout values.
- Introduced `DEEPEVAL_PER_TASK_TIMEOUT_SECONDS` and `DEEPEVAL_TASK_GATHER_BUFFER_SECONDS` for configuration of task and gather timeouts.
Add _snapshot_tasks() and use it to:
- Take a baseline snapshot of tasks on the current loop before yielding goldens.
- After gather, detect leftover tasks not in the baseline or created_tasks,
  then log, cancel, and drain them (when DEEPEVAL_DEBUG_ASYNC is enabled).

Also remove the redundant `asyncio.create_task = loop.create_task` assignment
before trace evaluation as we already restore the original create_task earlier.

This reduces hangs caused by tasks leaking across iterations/loops and ensures
only tasks created in the current run are awaited.
We now track and gather tasks per loop. The global cache is no longer
read and is not needed. The cache posed cross loop task management
risks, which is why we have moved away from it.

- Delete `global_evaluation_tasks` singleton and related imports.
- Stop appending to the global cache in `EvaluationDataset.evaluate()`;
  still coerce the task to ensure validation/side effects run.
- Remove finalizers that cleared the global cache in
  `execute.py` looped paths.
This test depends on secrets. It is still be run using the maintainer
only workflow.
Add test_synthesizer/test_context_generator.py to ingore list when
running dev tests without secrets. This test suite requires secrets
Add test_synthesizer/test_conversation_simulator.py to ignore list
when running dev tests without secrets
…ent tests

- Ungate stray task cancel and drain cleanup in agentic iterator.
- Keep detailed logs behind DEEPEVAL_DEBUG_ASYNC.
- Add ignores for tests requiring OpenAI secrets.
- Add ignores for tests requiring OpenAI secrets.
Add ignores for tests requiring OpenAI secrets.
@BloggerBust BloggerBust force-pushed the fix/async-gather-reuse-hang branch from e3db170 to 96e1e10 Compare September 21, 2025 17:08
@penguine-ip penguine-ip merged commit 8152c6e into confident-ai:main Sep 22, 2025
2 of 4 checks passed
@trevor-cai trevor-cai mentioned this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants