Skip to content

Commit 1b2814b

Browse files
committed
ci(rust): serialize core lib tests to dodge cross-runtime semaphore hang
Root cause of the long-standing intermittent hang on the `Rust Core Tests + Quality` and `Rust Core Coverage` jobs: `cargo test -p openhuman` runs `#[tokio::test]` tests in parallel, each on its own current-thread runtime, but several modules share process-wide singletons: * `scheduler_gate::LLM_PERMITS` — `tokio::sync::Semaphore` * `scheduler_gate::STATE` — `OnceLock<RwLock<State>>` * `local_ai::LOCAL_AI_TEST_MUTEX` — `std::sync::Mutex` * `event_bus::testing::BUS_HANDLER_LOCK` — `tokio::sync::Mutex` The `tokio::sync::Semaphore` waiter in test A registers a Waker tied to runtime A. When test B drops the held permit from runtime B and that runtime is in the process of tearing down, the cross-runtime wake does not fire reliably — A's spawned future never resumes. On the CI runner this presents as a hung `cargo test` step; locally on faster hosts the prior assertion (`try_acquire_llm_permit().expect("must start free")`) panics first and only manifests as a `FAILED` test. Running `cargo test -p openhuman -- --test-threads=1` is a one-line mitigation that eliminates the parallelism and brings the step from "hangs indefinitely" to a deterministic ~90s on the same suite. The deeper test-isolation refactor (swap the OnceLocks for resettable cells, shared serial mutex around every `wait_for_capacity` callsite) is queued as a follow-up — this commit unblocks CI immediately. Applied to both `.github/workflows/test.yml` (`Test core crate`) and `.github/workflows/coverage.yml` (`Rust Core Coverage`).
1 parent 4a16c79 commit 1b2814b

2 files changed

Lines changed: 12 additions & 2 deletions

File tree

.github/workflows/coverage.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,10 @@ jobs:
8484
- name: Install cargo-llvm-cov
8585
uses: taiki-e/install-action@cargo-llvm-cov
8686
- name: Run cargo llvm-cov for openhuman core
87-
run: cargo llvm-cov -p openhuman --lcov --output-path lcov-core.info
87+
# See test.yml `Test core crate` for why `--test-threads=1` is
88+
# required: process-wide singletons in scheduler_gate / local_ai
89+
# do not isolate cleanly across parallel `#[tokio::test]` runtimes.
90+
run: cargo llvm-cov -p openhuman --lcov --output-path lcov-core.info -- --test-threads=1
8891
- name: Upload core lcov
8992
uses: actions/upload-artifact@v5
9093
with:

.github/workflows/test.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,14 @@ jobs:
8383
uses: mozilla-actions/sccache-action@v0.0.9
8484

8585
- name: Test core crate (openhuman)
86-
run: cargo test -p openhuman
86+
# Serialize tests: several modules share process-wide singletons
87+
# (`scheduler_gate::LLM_PERMITS` semaphore, `scheduler_gate::STATE`,
88+
# `LOCAL_AI_TEST_MUTEX`, `BUS_HANDLER_LOCK`) that don't cleanly
89+
# isolate when `#[tokio::test]` runtimes run in parallel — on the
90+
# CI runner some waiters never get woken when another runtime
91+
# drops the permit, wedging the whole binary. Runs in ~90s; the
92+
# underlying isolation refactor is tracked separately.
93+
run: cargo test -p openhuman -- --test-threads=1
8794

8895
rust-tauri-tests:
8996
name: Rust Tauri Shell Tests

0 commit comments

Comments
 (0)