test(triage): inject permit acquisition so evaluator tests stop hanging#1524
test(triage): inject permit acquisition so evaluator tests stop hanging#1524senamakel wants to merge 4 commits into
Conversation
…ng on shared gate state `run_triage_with_arms` called `scheduler_gate::wait_for_capacity()` directly on the local-fallback path, coupling the triage tests to the process-wide 1-slot `LLM_PERMITS` semaphore and to whatever `Policy` some earlier test happened to leave in the gate's `STATE` `OnceLock`. When a sibling test's runtime had already been dropped, a stale `Paused` policy could trap subsequent `wait_for_capacity` callers in the poll loop forever (no sampler to flip it back), and every triage test queued on `BUS_HANDLER_LOCK` showed up as "stuck > 60s". Factor the permit acquisition into a closure parameter on a private `run_triage_with_arms_inner`. Production paths keep the real gate; a new `#[cfg(test)] run_triage_with_arms_for_test` passes a no-op acquirer so the tests no longer touch the shared semaphore or policy.
📝 WalkthroughWalkthroughRefactors the triage evaluator to centralize cloud→retry→local in an internal driver accepting an injected async permit-acquisition callback. Adds a cfg(test) entry point ChangesPermit Injection and Test Harness
CI Test Serialization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Root cause of the long-standing intermittent hang on the `Rust Core
Tests + Quality` and `Rust Core Coverage` jobs: `cargo test -p openhuman`
runs `#[tokio::test]` tests in parallel, each on its own current-thread
runtime, but several modules share process-wide singletons:
* `scheduler_gate::LLM_PERMITS` — `tokio::sync::Semaphore`
* `scheduler_gate::STATE` — `OnceLock<RwLock<State>>`
* `local_ai::LOCAL_AI_TEST_MUTEX` — `std::sync::Mutex`
* `event_bus::testing::BUS_HANDLER_LOCK` — `tokio::sync::Mutex`
The `tokio::sync::Semaphore` waiter in test A registers a Waker tied to
runtime A. When test B drops the held permit from runtime B and that
runtime is in the process of tearing down, the cross-runtime wake does
not fire reliably — A's spawned future never resumes. On the CI runner
this presents as a hung `cargo test` step; locally on faster hosts the
prior assertion (`try_acquire_llm_permit().expect("must start free")`)
panics first and only manifests as a `FAILED` test.
Running `cargo test -p openhuman -- --test-threads=1` is a one-line
mitigation that eliminates the parallelism and brings the step from
"hangs indefinitely" to a deterministic ~90s on the same suite. The
deeper test-isolation refactor (swap the OnceLocks for resettable
cells, shared serial mutex around every `wait_for_capacity` callsite)
is queued as a follow-up — this commit unblocks CI immediately.
Applied to both `.github/workflows/test.yml` (`Test core crate`) and
`.github/workflows/coverage.yml` (`Rust Core Coverage`).
…arallel The previous commit serialized the entire `cargo test -p openhuman` invocation with `--test-threads=1`, which fixed the lib-test hang but made the whole job ~6x slower because the 13 integration test binaries under `tests/*.rs` were also forced to run serially. The cross-runtime-semaphore-waker hang is a lib-test phenomenon — integration tests each compile to their own binary with their own process-level statics, so they don't share the problematic singletons. Run the lib serially and let the integration tests keep their default parallel scheduling. For coverage, `cargo llvm-cov --no-report` accumulates profraw across multiple invocations; a final `cargo llvm-cov report --lcov` emits the merged lcov so the diff-coverage gate sees both lib + integration coverage as before.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/test.yml:
- Around line 93-97: The CI workflow currently runs "cargo test --lib -p
openhuman -- --test-threads=1" and "cargo test --tests -p openhuman" which omit
doctests; update the workflow to exercise doctests by adding the --doc target
(either add a dedicated step running "cargo test -p openhuman --doc" or change
the integration step to "cargo test --tests --doc -p openhuman") so the crate's
doctests are executed in CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7aaf08d1-038d-42cc-8f7d-bc830c969783
📒 Files selected for processing (2)
.github/workflows/coverage.yml.github/workflows/test.yml
| run: cargo test --lib -p openhuman -- --test-threads=1 | ||
| - name: Test core crate — integration tests (parallel) | ||
| # Each `tests/*.rs` integration test is its own binary so doesn't | ||
| # share static state with siblings — safe to keep default parallel. | ||
| run: cargo test --tests -p openhuman |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Rust doctest-like code fences:"
rg -n --type rust '^\s*///\s*```|^\s*//! ?```'
echo
echo "Workflow doc-test invocations:"
rg -n -- '--doc|--doctest|--doctests' .github/workflows/test.yml .github/workflows/coverage.ymlRepository: tinyhumansai/openhuman
Length of output: 1973
🏁 Script executed:
# Try alternative approach with web search or direct verification
rg -n --type rust '^\s*///\s*```|^\s*//! ?```' 2>/dev/null || echo "Could not search codebase"Repository: tinyhumansai/openhuman
Length of output: 4772
Add doctest coverage to CI workflow.
Doctests are present throughout the openhuman crate but are not exercised by the current CI workflow. Lines 93 and 97 run only --lib and --tests, respectively, which excludes the --doc test variant.
Suggested fix
- name: Test core crate — integration tests (parallel)
# Each `tests/*.rs` integration test is its own binary so doesn't
# share static state with siblings — safe to keep default parallel.
run: cargo test --tests -p openhuman
+ - name: Test core crate — doctests
+ run: cargo test --doc -p openhuman🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/test.yml around lines 93 - 97, The CI workflow currently
runs "cargo test --lib -p openhuman -- --test-threads=1" and "cargo test --tests
-p openhuman" which omit doctests; update the workflow to exercise doctests by
adding the --doc target (either add a dedicated step running "cargo test -p
openhuman --doc" or change the integration step to "cargo test --tests --doc -p
openhuman") so the crate's doctests are executed in CI.
Summary
cloud_5xx_falls_through_to_local_fallback,cloud_then_local_failure_returns_deferred,double_429_falls_through_to_local_fallback,fatal_cloud_error_short_circuits_without_local_attempt).run_triage_with_armscallsscheduler_gate::wait_for_capacity()directly on the local-fallback path, coupling the tests to the process-wide 1-slotLLM_PERMITSsemaphore and to anyPolicyleft in the gate's globalSTATEOnceLockby another test. A stalePolicy::Pausedtraps callers in the poll loop with no sampler to flip it back; siblings queued onBUS_HANDLER_LOCKthen all appear stuck.run_triage_with_arms_innerthat takes the permit acquirer as a closure. Production paths keepwait_for_capacity; a new#[cfg(test)] run_triage_with_arms_for_testpasses a no-op so the tests no longer touch the shared semaphore or policy.Problem
The four
openhuman::agent::triage::evaluator::tests::*integration tests reach the local-fallback arm, which callsscheduler_gate::wait_for_capacity(). That uses a single-slot semaphore shared across the whole test binary, and reads aPolicyout of aOnceLockthat survives the#[tokio::test]runtime which initialised it. When a paused policy ends up persisted (sampler task dies with its runtime), every later caller spins forever intokio::time::sleep(paused_poll_ms). The siblings queue onBUS_HANDLER_LOCKand the whole test group shows up as stuck.Solution
evaluator.rs: factor the body ofrun_triage_with_armsintorun_triage_with_arms_inner(..., acquire_permit: F)whereF: FnOnce() -> Future<Output = Option<LlmPermit>>.run_triage,run_triage_with_arms) pass|| scheduler_gate::wait_for_capacity()— behaviour unchanged.#[cfg(test)] pub async fn run_triage_with_arms_for_testpasses|| async { None }. The seven evaluator integration tests now call that.scheduler_gateglobals, so they can't be wedged by a stalePausedpolicy or by a sibling holding the permit.Verified locally:
cargo test --lib openhuman::agent::triage::evaluator::tests→ 14 passed in 1.53s (previously: four tests stuck >60s intermittently).Submission Checklist
diff-cover) meet the gate enforced by.github/workflows/coverage.yml.docs/TEST-COVERAGE-MATRIX.mdreflect this change.## Related.docs/RELEASE-MANUAL-SMOKE.md).Closes #NNNin the## Relatedsection.Impact
scheduler_gate::wait_for_capacity()exactly as before.openhuman_corelib test binary.Related
scheduler_gateglobalSTATE/LLM_PERMITSshared across the test binary is still a foot-gun for other tests that contend on it; worth a test-onlyreset_for_tests()if more flakes surface.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:checkpnpm typecheckcargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::evaluator::tests→ 14 passed in 1.53scargo check --manifest-path Cargo.toml --tests→ cleanValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
run_triageand the publicrun_triage_with_armsboth still callscheduler_gate::wait_for_capacity()on the local-fallback path with identical semantics.run_triage_with_arms_inneris the single source of truth; only the permit acquirer is injected.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Refactor
Tests
Chores (CI)