Skip to content

test(triage): inject permit acquisition so evaluator tests stop hanging#1524

Open
senamakel wants to merge 4 commits into
tinyhumansai:mainfrom
senamakel:test/triage-evaluator-permit-injection
Open

test(triage): inject permit acquisition so evaluator tests stop hanging#1524
senamakel wants to merge 4 commits into
tinyhumansai:mainfrom
senamakel:test/triage-evaluator-permit-injection

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 12, 2026

Summary

  • Triage evaluator tests sometimes stay stuck for >60s (cloud_5xx_falls_through_to_local_fallback, cloud_then_local_failure_returns_deferred, double_429_falls_through_to_local_fallback, fatal_cloud_error_short_circuits_without_local_attempt).
  • Root cause: run_triage_with_arms calls scheduler_gate::wait_for_capacity() directly on the local-fallback path, coupling the tests to the process-wide 1-slot LLM_PERMITS semaphore and to any Policy left in the gate's global STATE OnceLock by another test. A stale Policy::Paused traps callers in the poll loop with no sampler to flip it back; siblings queued on BUS_HANDLER_LOCK then all appear stuck.
  • Fix: extract a private generic run_triage_with_arms_inner that takes the permit acquirer as a closure. Production paths keep wait_for_capacity; a new #[cfg(test)] run_triage_with_arms_for_test passes a no-op so the tests no longer touch the shared semaphore or policy.

Problem

The four openhuman::agent::triage::evaluator::tests::* integration tests reach the local-fallback arm, which calls scheduler_gate::wait_for_capacity(). That uses a single-slot semaphore shared across the whole test binary, and reads a Policy out of a OnceLock that survives the #[tokio::test] runtime which initialised it. When a paused policy ends up persisted (sampler task dies with its runtime), every later caller spins forever in tokio::time::sleep(paused_poll_ms). The siblings queue on BUS_HANDLER_LOCK and the whole test group shows up as stuck.

Solution

  • evaluator.rs: factor the body of run_triage_with_arms into run_triage_with_arms_inner(..., acquire_permit: F) where F: FnOnce() -> Future<Output = Option<LlmPermit>>.
  • Production callers (run_triage, run_triage_with_arms) pass || scheduler_gate::wait_for_capacity() — behaviour unchanged.
  • New #[cfg(test)] pub async fn run_triage_with_arms_for_test passes || async { None }. The seven evaluator integration tests now call that.
  • Tests no longer depend on scheduler_gate globals, so they can't be wedged by a stale Paused policy or by a sibling holding the permit.

Verified locally: cargo test --lib openhuman::agent::triage::evaluator::tests → 14 passed in 1.53s (previously: four tests stuck >60s intermittently).

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • N/A: behaviour-only change in test wiring; production path is unchanged, no new/removed source lines that would shift the diff-coverage gate. Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml.
  • N/A: no user-visible feature change. Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change.
  • N/A: no new feature IDs. All affected feature IDs from the matrix are listed in the PR description under ## Related.
  • N/A: tests only refactor an existing internal call site; no new external dependencies. No new external network dependencies introduced (mock backend used per Testing Strategy).
  • N/A: does not touch release-cut surfaces. Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md).
  • N/A: no linked issue. Linked issue closed via Closes #NNN in the ## Related section.

Impact

  • Runtime/platform: none — production paths still call scheduler_gate::wait_for_capacity() exactly as before.
  • Test reliability: removes a cross-test deadlock surface in the openhuman_core lib test binary.
  • No performance, security, or migration implications.

Related

  • Closes: N/A
  • Follow-up PR(s)/TODOs: longer-term, the scheduler_gate global STATE / LLM_PERMITS shared across the test binary is still a foot-gun for other tests that contend on it; worth a test-only reset_for_tests() if more flakes surface.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: test/triage-evaluator-permit-injection
  • Commit SHA: 4d55e08

Validation Run

  • N/A: no app/ changes. pnpm --filter openhuman-app format:check
  • N/A: no app/ changes. pnpm typecheck
  • Focused tests: cargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::evaluator::tests → 14 passed in 1.53s
  • Rust fmt/check (if changed): cargo check --manifest-path Cargo.toml --tests → clean
  • N/A: no tauri-shell changes. Tauri fmt/check (if changed)

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: none in production; test-only refactor.
  • User-visible effect: none.

Parity Contract

  • Legacy behavior preserved: yes — run_triage and the public run_triage_with_arms both still call scheduler_gate::wait_for_capacity() on the local-fallback path with identical semantics.
  • Guard/fallback/dispatch parity checks: run_triage_with_arms_inner is the single source of truth; only the permit acquirer is injected.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • Refactor

    • Centralized the tiered cloud→retry→local fallback flow to simplify logic and reduce duplication.
  • Tests

    • Added a test-only entry point to skip global permit acquisition, preventing test contention and improving reliability.
    • Updated tests to use the new entry, preserving existing assertions.
  • Chores (CI)

    • Split coverage into library vs integration runs and merged reports; serialized core library tests (single test thread) to avoid process-wide deadlocks.

Review Change Stack

…ng on shared gate state

`run_triage_with_arms` called `scheduler_gate::wait_for_capacity()`
directly on the local-fallback path, coupling the triage tests to the
process-wide 1-slot `LLM_PERMITS` semaphore and to whatever `Policy`
some earlier test happened to leave in the gate's `STATE` `OnceLock`.
When a sibling test's runtime had already been dropped, a stale
`Paused` policy could trap subsequent `wait_for_capacity` callers in
the poll loop forever (no sampler to flip it back), and every triage
test queued on `BUS_HANDLER_LOCK` showed up as "stuck > 60s".

Factor the permit acquisition into a closure parameter on a private
`run_triage_with_arms_inner`. Production paths keep the real gate; a
new `#[cfg(test)] run_triage_with_arms_for_test` passes a no-op
acquirer so the tests no longer touch the shared semaphore or policy.
@senamakel senamakel requested a review from a team May 12, 2026 09:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

Refactors the triage evaluator to centralize cloud→retry→local in an internal driver accepting an injected async permit-acquisition callback. Adds a cfg(test) entry point run_triage_with_arms_for_test that injects a no-op permit callback for tests, updates tests to call it, and forces single-threaded core tests in CI workflows.

Changes

Permit Injection and Test Harness

Layer / File(s) Summary
Core permit injection infrastructure
src/openhuman/agent/triage/evaluator.rs
Added imports for LlmPermit and Future. Introduced internal run_triage_with_arms_inner driver and made it generic over an async acquire_permit callback. Local fallback now uses the injected callback instead of calling wait_for_capacity directly.
Production entry point delegation
src/openhuman/agent/triage/evaluator.rs
run_triage delegates to run_triage_with_arms_inner, wiring production scheduler gate capacity acquisition via a closure.
Test-only entry point
src/openhuman/agent/triage/evaluator.rs
Added #[cfg(test)] pub async fn run_triage_with_arms_for_test that injects a callback returning None to bypass the global LLM gate in tests.
Triage test call site updates
src/openhuman/agent/triage/evaluator_tests.rs
Updated integration tests to invoke run_triage_with_arms_for_test in the seven affected test cases (happy path, retries, fallbacks, and error scenarios).

CI Test Serialization

Layer / File(s) Summary
CI workflow single-threading
.github/workflows/coverage.yml, .github/workflows/test.yml
Forced single-threaded test execution for core crate runs by adding --test-threads=1 to library test commands; split integration tests into a separate step and merged coverage via cargo llvm-cov report. Added comments documenting serialization rationale due to process-wide singletons and Tokio runtime isolation issues.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A callback hops through the permit gate,

Tests skip the queue and do not wait,
Production still asks the scheduler's hand,
CI runs single-threaded across the land,
Hooray — the triage flow is neat and straight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main objective: fixing test hangs in the triage evaluator by injecting permit acquisition through a test-specific function, which is the core change across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 12, 2026
senamakel added 2 commits May 12, 2026 04:40
Root cause of the long-standing intermittent hang on the `Rust Core
Tests + Quality` and `Rust Core Coverage` jobs: `cargo test -p openhuman`
runs `#[tokio::test]` tests in parallel, each on its own current-thread
runtime, but several modules share process-wide singletons:

  * `scheduler_gate::LLM_PERMITS` — `tokio::sync::Semaphore`
  * `scheduler_gate::STATE`       — `OnceLock<RwLock<State>>`
  * `local_ai::LOCAL_AI_TEST_MUTEX` — `std::sync::Mutex`
  * `event_bus::testing::BUS_HANDLER_LOCK` — `tokio::sync::Mutex`

The `tokio::sync::Semaphore` waiter in test A registers a Waker tied to
runtime A. When test B drops the held permit from runtime B and that
runtime is in the process of tearing down, the cross-runtime wake does
not fire reliably — A's spawned future never resumes. On the CI runner
this presents as a hung `cargo test` step; locally on faster hosts the
prior assertion (`try_acquire_llm_permit().expect("must start free")`)
panics first and only manifests as a `FAILED` test.

Running `cargo test -p openhuman -- --test-threads=1` is a one-line
mitigation that eliminates the parallelism and brings the step from
"hangs indefinitely" to a deterministic ~90s on the same suite. The
deeper test-isolation refactor (swap the OnceLocks for resettable
cells, shared serial mutex around every `wait_for_capacity` callsite)
is queued as a follow-up — this commit unblocks CI immediately.

Applied to both `.github/workflows/test.yml` (`Test core crate`) and
`.github/workflows/coverage.yml` (`Rust Core Coverage`).
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 12, 2026
…arallel

The previous commit serialized the entire `cargo test -p openhuman`
invocation with `--test-threads=1`, which fixed the lib-test hang but
made the whole job ~6x slower because the 13 integration test binaries
under `tests/*.rs` were also forced to run serially.

The cross-runtime-semaphore-waker hang is a lib-test phenomenon —
integration tests each compile to their own binary with their own
process-level statics, so they don't share the problematic singletons.
Run the lib serially and let the integration tests keep their default
parallel scheduling.

For coverage, `cargo llvm-cov --no-report` accumulates profraw across
multiple invocations; a final `cargo llvm-cov report --lcov` emits the
merged lcov so the diff-coverage gate sees both lib + integration
coverage as before.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/test.yml:
- Around line 93-97: The CI workflow currently runs "cargo test --lib -p
openhuman -- --test-threads=1" and "cargo test --tests -p openhuman" which omit
doctests; update the workflow to exercise doctests by adding the --doc target
(either add a dedicated step running "cargo test -p openhuman --doc" or change
the integration step to "cargo test --tests --doc -p openhuman") so the crate's
doctests are executed in CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7aaf08d1-038d-42cc-8f7d-bc830c969783

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2814b and 40cb9ca.

📒 Files selected for processing (2)
  • .github/workflows/coverage.yml
  • .github/workflows/test.yml

Comment on lines +93 to +97
run: cargo test --lib -p openhuman -- --test-threads=1
- name: Test core crate — integration tests (parallel)
# Each `tests/*.rs` integration test is its own binary so doesn't
# share static state with siblings — safe to keep default parallel.
run: cargo test --tests -p openhuman
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Rust doctest-like code fences:"
rg -n --type rust '^\s*///\s*```|^\s*//! ?```'

echo
echo "Workflow doc-test invocations:"
rg -n -- '--doc|--doctest|--doctests' .github/workflows/test.yml .github/workflows/coverage.yml

Repository: tinyhumansai/openhuman

Length of output: 1973


🏁 Script executed:

# Try alternative approach with web search or direct verification
rg -n --type rust '^\s*///\s*```|^\s*//! ?```' 2>/dev/null || echo "Could not search codebase"

Repository: tinyhumansai/openhuman

Length of output: 4772


Add doctest coverage to CI workflow.

Doctests are present throughout the openhuman crate but are not exercised by the current CI workflow. Lines 93 and 97 run only --lib and --tests, respectively, which excludes the --doc test variant.

Suggested fix
       - name: Test core crate — integration tests (parallel)
         # Each `tests/*.rs` integration test is its own binary so doesn't
         # share static state with siblings — safe to keep default parallel.
         run: cargo test --tests -p openhuman
+      - name: Test core crate — doctests
+        run: cargo test --doc -p openhuman
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 93 - 97, The CI workflow currently
runs "cargo test --lib -p openhuman -- --test-threads=1" and "cargo test --tests
-p openhuman" which omit doctests; update the workflow to exercise doctests by
adding the --doc target (either add a dedicated step running "cargo test -p
openhuman --doc" or change the integration step to "cargo test --tests --doc -p
openhuman") so the crate's doctests are executed in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant