fix(async,ci): prevent coroutine re-await & stabilize async evals#2068
Merged
penguine-ip merged 12 commits intoconfident-ai:mainfrom Sep 22, 2025
Merged
fix(async,ci): prevent coroutine re-await & stabilize async evals#2068penguine-ip merged 12 commits intoconfident-ai:mainfrom
penguine-ip merged 12 commits intoconfident-ai:mainfrom
Conversation
Contributor
BloggerBust
commented
Sep 18, 2025
- Track and gather only tasks created on the current loop during evals to avoid “cannot reuse already awaited coroutine” and cross loop errors.
- Introduce coerce_to_task() to normalize Task/Future/coroutine/awaitable inputs.
- Refactor EvaluationTasks to store asyncio Futures, return copies from get_tasks(), and provide a clear_tasks() that cancels pending tasks before clearing.
- call loop.shutdown_asyncgens() in tracing worker before close.
- Replace blocking time.sleep() with await asyncio.sleep() in async test helper.
- CI: add maintainer only full test workflow with secrets gating & concurrency
- Ruff cleanups
|
@BloggerBust is attempting to deploy a commit to the Confident AI Team on Vercel. A member of the Team first needs to authorize it. |
040cce4 to
341c268
Compare
- Track and gather only tasks created on the current loop during evals to avoid “cannot reuse already awaited coroutine” and cross loop errors. - Introduce coerce_to_task() to normalize Task/Future/coroutine/awaitable inputs. - Refactor EvaluationTasks to store asyncio Futures, return copies from get_tasks(), and provide a clear_tasks() that cancels pending tasks before clearing. - call loop.shutdown_asyncgens() in tracing worker before close. - Replace blocking time.sleep() with await asyncio.sleep() in async test helper. - CI: add maintainer only full test workflow with secrets gating & concurrency - Ruff cleanups
- Implemented logging to track per task and gather timeouts separately. - Added task metadata collection to help debug stalled tasks. - Enhanced logging with suggestions for increasing timeout values. - Introduced `DEEPEVAL_PER_TASK_TIMEOUT_SECONDS` and `DEEPEVAL_TASK_GATHER_BUFFER_SECONDS` for configuration of task and gather timeouts.
Add _snapshot_tasks() and use it to: - Take a baseline snapshot of tasks on the current loop before yielding goldens. - After gather, detect leftover tasks not in the baseline or created_tasks, then log, cancel, and drain them (when DEEPEVAL_DEBUG_ASYNC is enabled). Also remove the redundant `asyncio.create_task = loop.create_task` assignment before trace evaluation as we already restore the original create_task earlier. This reduces hangs caused by tasks leaking across iterations/loops and ensures only tasks created in the current run are awaited.
We now track and gather tasks per loop. The global cache is no longer read and is not needed. The cache posed cross loop task management risks, which is why we have moved away from it. - Delete `global_evaluation_tasks` singleton and related imports. - Stop appending to the global cache in `EvaluationDataset.evaluate()`; still coerce the task to ensure validation/side effects run. - Remove finalizers that cleared the global cache in `execute.py` looped paths.
This test depends on secrets. It is still be run using the maintainer only workflow.
Add test_synthesizer/test_context_generator.py to ingore list when running dev tests without secrets. This test suite requires secrets
Add test_synthesizer/test_conversation_simulator.py to ignore list when running dev tests without secrets
…ent tests - Ungate stray task cancel and drain cleanup in agentic iterator. - Keep detailed logs behind DEEPEVAL_DEBUG_ASYNC. - Add ignores for tests requiring OpenAI secrets.
- Add ignores for tests requiring OpenAI secrets.
Add ignores for tests requiring OpenAI secrets.
e3db170 to
96e1e10
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.