test(triage): inject permit acquisition so evaluator tests stop hanging by senamakel · Pull Request #1524 · tinyhumansai/openhuman

senamakel · 2026-05-12T09:59:00Z

Summary

Triage evaluator tests sometimes stay stuck for >60s (cloud_5xx_falls_through_to_local_fallback, cloud_then_local_failure_returns_deferred, double_429_falls_through_to_local_fallback, fatal_cloud_error_short_circuits_without_local_attempt).
Root cause: run_triage_with_arms calls scheduler_gate::wait_for_capacity() directly on the local-fallback path, coupling the tests to the process-wide 1-slot LLM_PERMITS semaphore and to any Policy left in the gate's global STATE OnceLock by another test. A stale Policy::Paused traps callers in the poll loop with no sampler to flip it back; siblings queued on BUS_HANDLER_LOCK then all appear stuck.
Fix: extract a private generic run_triage_with_arms_inner that takes the permit acquirer as a closure. Production paths keep wait_for_capacity; a new #[cfg(test)] run_triage_with_arms_for_test passes a no-op so the tests no longer touch the shared semaphore or policy.

Problem

The four openhuman::agent::triage::evaluator::tests::* integration tests reach the local-fallback arm, which calls scheduler_gate::wait_for_capacity(). That uses a single-slot semaphore shared across the whole test binary, and reads a Policy out of a OnceLock that survives the #[tokio::test] runtime which initialised it. When a paused policy ends up persisted (sampler task dies with its runtime), every later caller spins forever in tokio::time::sleep(paused_poll_ms). The siblings queue on BUS_HANDLER_LOCK and the whole test group shows up as stuck.

Solution

evaluator.rs: factor the body of run_triage_with_arms into run_triage_with_arms_inner(..., acquire_permit: F) where F: FnOnce() -> Future<Output = Option<LlmPermit>>.
Production callers (run_triage, run_triage_with_arms) pass || scheduler_gate::wait_for_capacity() — behaviour unchanged.
New #[cfg(test)] pub async fn run_triage_with_arms_for_test passes || async { None }. The seven evaluator integration tests now call that.
Tests no longer depend on scheduler_gate globals, so they can't be wedged by a stale Paused policy or by a sibling holding the permit.

Verified locally: cargo test --lib openhuman::agent::triage::evaluator::tests → 14 passed in 1.53s (previously: four tests stuck >60s intermittently).

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
N/A: behaviour-only change in test wiring; production path is unchanged, no new/removed source lines that would shift the diff-coverage gate. Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml.
N/A: no user-visible feature change. Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change.
N/A: no new feature IDs. All affected feature IDs from the matrix are listed in the PR description under ## Related.
N/A: tests only refactor an existing internal call site; no new external dependencies. No new external network dependencies introduced (mock backend used per Testing Strategy).
N/A: does not touch release-cut surfaces. Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md).
N/A: no linked issue. Linked issue closed via Closes #NNN in the ## Related section.

Impact

Runtime/platform: none — production paths still call scheduler_gate::wait_for_capacity() exactly as before.
Test reliability: removes a cross-test deadlock surface in the openhuman_core lib test binary.
No performance, security, or migration implications.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: test/triage-evaluator-permit-injection
Commit SHA: 4d55e08

Validation Run

N/A: no app/ changes. pnpm --filter openhuman-app format:check
N/A: no app/ changes. pnpm typecheck
Focused tests: cargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::evaluator::tests → 14 passed in 1.53s
Rust fmt/check (if changed): cargo check --manifest-path Cargo.toml --tests → clean
N/A: no tauri-shell changes. Tauri fmt/check (if changed)

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: none in production; test-only refactor.
User-visible effect: none.

Parity Contract

Legacy behavior preserved: yes — run_triage and the public run_triage_with_arms both still call scheduler_gate::wait_for_capacity() on the local-fallback path with identical semantics.
Guard/fallback/dispatch parity checks: run_triage_with_arms_inner is the single source of truth; only the permit acquirer is injected.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: N/A
Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

Refactor
- Centralized the tiered cloud→retry→local fallback flow to simplify logic and reduce duplication.
Tests
- Added a test-only entry point to skip global permit acquisition, preventing test contention and improving reliability.
- Updated tests to use the new entry, preserving existing assertions.
Chores (CI)
- Split coverage into library vs integration runs and merged reports; serialized core library tests (single test thread) to avoid process-wide deadlocks.

…ng on shared gate state `run_triage_with_arms` called `scheduler_gate::wait_for_capacity()` directly on the local-fallback path, coupling the triage tests to the process-wide 1-slot `LLM_PERMITS` semaphore and to whatever `Policy` some earlier test happened to leave in the gate's `STATE` `OnceLock`. When a sibling test's runtime had already been dropped, a stale `Paused` policy could trap subsequent `wait_for_capacity` callers in the poll loop forever (no sampler to flip it back), and every triage test queued on `BUS_HANDLER_LOCK` showed up as "stuck > 60s". Factor the permit acquisition into a closure parameter on a private `run_triage_with_arms_inner`. Production paths keep the real gate; a new `#[cfg(test)] run_triage_with_arms_for_test` passes a no-op acquirer so the tests no longer touch the shared semaphore or policy.

coderabbitai · 2026-05-12T09:59:25Z

📝 Walkthrough

Walkthrough

Refactors the triage evaluator to centralize cloud→retry→local in an internal driver accepting an injected async permit-acquisition callback. Adds a cfg(test) entry point run_triage_with_arms_for_test that injects a no-op permit callback for tests, updates tests to call it, and forces single-threaded core tests in CI workflows.

Changes

Permit Injection and Test Harness

Layer / File(s)	Summary
Core permit injection infrastructure `src/openhuman/agent/triage/evaluator.rs`	Added imports for `LlmPermit` and `Future`. Introduced internal `run_triage_with_arms_inner` driver and made it generic over an async `acquire_permit` callback. Local fallback now uses the injected callback instead of calling `wait_for_capacity` directly.
Production entry point delegation `src/openhuman/agent/triage/evaluator.rs`	`run_triage` delegates to `run_triage_with_arms_inner`, wiring production scheduler gate capacity acquisition via a closure.
Test-only entry point `src/openhuman/agent/triage/evaluator.rs`	Added `#[cfg(test)] pub async fn run_triage_with_arms_for_test` that injects a callback returning `None` to bypass the global LLM gate in tests.
Triage test call site updates `src/openhuman/agent/triage/evaluator_tests.rs`	Updated integration tests to invoke `run_triage_with_arms_for_test` in the seven affected test cases (happy path, retries, fallbacks, and error scenarios).

CI Test Serialization

Layer / File(s)	Summary
CI workflow single-threading `.github/workflows/coverage.yml`, `.github/workflows/test.yml`	Forced single-threaded test execution for core crate runs by adding `--test-threads=1` to library test commands; split integration tests into a separate step and merged coverage via `cargo llvm-cov report`. Added comments documenting serialization rationale due to process-wide singletons and Tokio runtime isolation issues.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

tinyhumansai/openhuman#1367: Both PRs modify src/openhuman/agent/triage/evaluator.rs and the triage arm/permit acquisition flow.
tinyhumansai/openhuman#895: Related changes to CI workflows splitting Rust test jobs and adjusting library test execution.

Poem

🐰 A callback hops through the permit gate,

Tests skip the queue and do not wait,
Production still asks the scheduler's hand,
CI runs single-threaded across the land,
Hooray — the triage flow is neat and straight!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main objective: fixing test hangs in the triage evaluator by injecting permit acquisition through a test-specific function, which is the core change across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…revious run

Root cause of the long-standing intermittent hang on the `Rust Core Tests + Quality` and `Rust Core Coverage` jobs: `cargo test -p openhuman` runs `#[tokio::test]` tests in parallel, each on its own current-thread runtime, but several modules share process-wide singletons: * `scheduler_gate::LLM_PERMITS` — `tokio::sync::Semaphore` * `scheduler_gate::STATE` — `OnceLock<RwLock<State>>` * `local_ai::LOCAL_AI_TEST_MUTEX` — `std::sync::Mutex` * `event_bus::testing::BUS_HANDLER_LOCK` — `tokio::sync::Mutex` The `tokio::sync::Semaphore` waiter in test A registers a Waker tied to runtime A. When test B drops the held permit from runtime B and that runtime is in the process of tearing down, the cross-runtime wake does not fire reliably — A's spawned future never resumes. On the CI runner this presents as a hung `cargo test` step; locally on faster hosts the prior assertion (`try_acquire_llm_permit().expect("must start free")`) panics first and only manifests as a `FAILED` test. Running `cargo test -p openhuman -- --test-threads=1` is a one-line mitigation that eliminates the parallelism and brings the step from "hangs indefinitely" to a deterministic ~90s on the same suite. The deeper test-isolation refactor (swap the OnceLocks for resettable cells, shared serial mutex around every `wait_for_capacity` callsite) is queued as a follow-up — this commit unblocks CI immediately. Applied to both `.github/workflows/test.yml` (`Test core crate`) and `.github/workflows/coverage.yml` (`Rust Core Coverage`).

…arallel The previous commit serialized the entire `cargo test -p openhuman` invocation with `--test-threads=1`, which fixed the lib-test hang but made the whole job ~6x slower because the 13 integration test binaries under `tests/*.rs` were also forced to run serially. The cross-runtime-semaphore-waker hang is a lib-test phenomenon — integration tests each compile to their own binary with their own process-level statics, so they don't share the problematic singletons. Run the lib serially and let the integration tests keep their default parallel scheduling. For coverage, `cargo llvm-cov --no-report` accumulates profraw across multiple invocations; a final `cargo llvm-cov report --lcov` emits the merged lcov so the diff-coverage gate sees both lib + integration coverage as before.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/test.yml:
- Around line 93-97: The CI workflow currently runs "cargo test --lib -p
openhuman -- --test-threads=1" and "cargo test --tests -p openhuman" which omit
doctests; update the workflow to exercise doctests by adding the --doc target
(either add a dedicated step running "cargo test -p openhuman --doc" or change
the integration step to "cargo test --tests --doc -p openhuman") so the crate's
doctests are executed in CI.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7aaf08d1-038d-42cc-8f7d-bc830c969783

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2814b and 40cb9ca.

📒 Files selected for processing (2)

.github/workflows/coverage.yml
.github/workflows/test.yml

coderabbitai · 2026-05-12T13:09:08Z

+        run: cargo test --lib -p openhuman -- --test-threads=1
+      - name: Test core crate — integration tests (parallel)
+        # Each `tests/*.rs` integration test is its own binary so doesn't
+        # share static state with siblings — safe to keep default parallel.
+        run: cargo test --tests -p openhuman


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Rust doctest-like code fences:" rg -n --type rust '^\s*///\s*```|^\s*//! ?```' echo echo "Workflow doc-test invocations:" rg -n -- '--doc|--doctest|--doctests' .github/workflows/test.yml .github/workflows/coverage.yml

Repository: tinyhumansai/openhuman

Length of output: 1973

🏁 Script executed:

# Try alternative approach with web search or direct verification rg -n --type rust '^\s*///\s*```|^\s*//! ?```' 2>/dev/null || echo "Could not search codebase"

Repository: tinyhumansai/openhuman

Length of output: 4772

Add doctest coverage to CI workflow.

Doctests are present throughout the openhuman crate but are not exercised by the current CI workflow. Lines 93 and 97 run only --lib and --tests, respectively, which excludes the --doc test variant.

Suggested fix

- name: Test core crate — integration tests (parallel) # Each `tests/*.rs` integration test is its own binary so doesn't # share static state with siblings — safe to keep default parallel. run: cargo test --tests -p openhuman + - name: Test core crate — doctests + run: cargo test --doc -p openhuman

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test.yml around lines 93 - 97, The CI workflow currently runs "cargo test --lib -p openhuman -- --test-threads=1" and "cargo test --tests -p openhuman" which omit doctests; update the workflow to exercise doctests by adding the --doc target (either add a dedicated step running "cargo test -p openhuman --doc" or change the integration step to "cargo test --tests --doc -p openhuman") so the crate's doctests are executed in CI.

senamakel requested a review from a team May 12, 2026 09:59

coderabbitai Bot previously approved these changes May 12, 2026

View reviewed changes

senamakel added 2 commits May 12, 2026 04:40

ci: re-trigger Rust core jobs after pre-existing gate flake hang on p…

4a16c79

…revious run

senamakel dismissed coderabbitai[bot]’s stale review via 1b2814b May 12, 2026 12:15

coderabbitai Bot previously approved these changes May 12, 2026

View reviewed changes

senamakel dismissed coderabbitai[bot]’s stale review via 40cb9ca May 12, 2026 13:02

coderabbitai Bot requested changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(triage): inject permit acquisition so evaluator tests stop hanging#1524

test(triage): inject permit acquisition so evaluator tests stop hanging#1524
senamakel wants to merge 4 commits into
tinyhumansai:mainfrom
senamakel:test/triage-evaluator-permit-injection

senamakel commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading