Skip to content

Commit 90ba4e0

Browse files
BiomeOS Developercursoragent
andcommitted
S225: PG-62 — health.liveness fast-path + startup reorder
health.liveness returns {"status":"starting"} during initialization and {"status":"alive"} once fully ready (via Arc<AtomicBool> readiness flag). health.readiness likewise transitions "starting" → "ready". Discovery registration + biomeOS scan moved after listener spawn so the socket accepts connections immediately. Recommended caller timeout: ≥3 seconds (documented in README). BearDog crypto.sign_contract (PG-60+) tracked in NEXT_STEPS for Phase 60+. Pre-existing clippy lint in auto_config/discoverer.rs fixed (items_after_statements). +5 tests. 22,843 workspace tests, 0 failures. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 2dc0faf commit 90ba4e0

23 files changed

Lines changed: 309 additions & 92 deletions

CONTEXT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ ToadStool is the **Layer 0** hardware substrate that other primals and springs d
3030
- Family: `compute-{family_id}.sock` / `compute-{family_id}-tarpc.sock`
3131
- **Peer primals**: Resolved at runtime via capability IDs and Unix-socket discovery (e.g. `capability.discover`, `resolve_capability_socket_fallback`) — not hardcoded URLs or legacy per-primal env manifests
3232
- **Discovery hierarchy** (primalSpring cross-cutting): Songbird `ipc.resolve` → biomeOS `capability.discover` → UDS filesystem convention → socket registry → TCP probing. toadStool implements tiers 1–4; TCP probing (tier 5) not used for local IPC
33-
- **Tests**: 22,838 (7,896+ lib-only, 0 failures, unlimited parallelism)
33+
- **Tests**: 22,843 (7,896+ lib-only, 0 failures, unlimited parallelism)
3434
- **Unsafe**: 46 blocks (all in hw-safe/GPU/VFIO/display/plugin containment, all SAFETY-documented; reconciled S221); workspace `unsafe_code = "deny"`, 41 crates `forbid` + 5 hw crates with narrow `#[allow(unsafe_code, reason)]`; all lint attrs have `reason =` (S211+S213)
3535
- **async-trait**: DEPRECATED — fully removed and banned in `deny.toml` (S203r); transitive only via axum/config/wiggle
3636
- **deny.toml**: `ring` + `async-trait` + `zstd-sys` bans active (ecoBin v3 compliant)

DOCUMENTATION.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ToadStool Documentation Hub
22

3-
**Last Updated**: May 2026 — S224
3+
**Last Updated**: May 2026 — S225
44

55
---
66

@@ -30,12 +30,12 @@ These root documents were **fully resolved** and **fossilized** in wateringHole
3030

3131
---
3232

33-
## Current State (S224 — May 2026)
33+
## Current State (S225 — May 2026)
3434

3535
**Post-budding, dependency-sovereign, IPC-first, fully concurrent, capability-based.** barraCuda is a separate primal at `ecoPrimals/barraCuda/`. ToadStool is the hardware infrastructure layer — GPU/NPU/CPU discovery, capability probing, workload orchestration, and shader dispatch.
3636

37-
- **22,838 tests** (7,896+ lib-only), 0 failures, 0 clippy warnings, 0 fmt diffs. Full workspace concurrent test suite.
38-
- **65 JSON-RPC methods** (incl. `compute.execute` direct route S203f). Wire Standard L3 (partial): `cost_estimates`, `operation_dependencies`. IPC compliant (`health.liveness``{"status":"alive"}`, `health.readiness` → ready+version, `health.check` full envelope, `capabilities.list`, `identity.get`).
37+
- **22,843 tests** (7,896+ lib-only), 0 failures, 0 clippy warnings, 0 fmt diffs. Full workspace concurrent test suite.
38+
- **65 JSON-RPC methods** (incl. `compute.execute` direct route S203f). Wire Standard L3 (partial): `cost_estimates`, `operation_dependencies`. IPC compliant (`health.liveness``{"status":"starting"|"alive"}` with PG-62 fast-path, `health.readiness``"starting"|"ready"`+version, `health.check` full envelope, `capabilities.list`, `identity.get`). **Recommended caller timeout: ≥3 seconds** for health probes during startup.
3939
- **Dual-socket IPC**`compute.sock` (JSON-RPC primary, biomeOS routes here) + `compute-tarpc.sock` (tarpc hot-path). Override: `TOADSTOOL_SOCKET` / `TOADSTOOL_TARPC_SOCKET`. Family: `compute-{fid}.sock` / `compute-{fid}-tarpc.sock`.
4040
- **Pipeline dispatch**`compute.dispatch.pipeline.submit` + `.status` for ordered multi-stage workloads (DAG, topological sort, result forwarding). Resolves neuralSpring PG-05.
4141
- **Capability-based everywhere**: 0 production hardcoded primal names, 0 production mocks, 0 production unwraps, 0 TODOs/FIXMEs. All primal references use `PRIMAL_NAME` constant or capability identifiers.

NEXT_STEPS.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# ToadStool -- Next Steps
22

3-
**Updated**: May 2026 — S224 (PG-55: --bind Flag + Localhost Default)
4-
**Status**: Production-grade | Rust edition **2024** (MSRV 1.85) | **AGPL-3.0-or-later** | **All quality gates green** | tests verified (22,833 workspace, 0 failures) | **~65 JSON-RPC methods** | Wire Standard L3 (partial) | Zero C FFI deps (ecoBin v3.0) | **Zero production panics/expects** | IPC-first | workspace `unsafe_code = "deny"`, **41 crates `forbid`** | **46 unsafe blocks** (all in hw containment, all SAFETY-documented, reconciled S221) | **0 production TODOs** | **rustix 1.x workspace-wide** | **capability-based primal references (no hardcoded names, S221)** | **`async-trait` DEPRECATED** (banned in `deny.toml`) | **`deny.toml` ring + async-trait + zstd-sys bans active** | **BTSP Phase 3 encrypted channel (ChaCha20-Poly1305, S215)** | **BTSP Phase 3 transport switch verified (S218)** | **BTSP handshake bounded + connection-reused** (PG-46 resolved, S214) | **All lint attrs with reason (S211+S213)** | **Auth issuer capability-based (S209)** | **Self-registration with Songbird (S207)** | **Encrypted compute dispatch (Phase 55)** | **Display Phase 2 (petalTongue IPC)** | **BTSP JSON-line relay (Phase 45c)** | **Orchestrator lock-panic-free (S213)**
5-
**Latest**: S224 — PG-55: `--bind host:port` flag on `server`/`daemon` subcommands (follows barraCuda pattern). TCP bind default changed from `0.0.0.0` (all interfaces) to `127.0.0.1` (loopback only). `TOADSTOOL_BIND_ADDRESS` env override preserved. +5 tests. 22,838 tests, 0 failures.
3+
**Updated**: May 2026 — S225 (PG-62: Health Liveness Fast-Path + Startup Reorder)
4+
**Status**: Production-grade | Rust edition **2024** (MSRV 1.85) | **AGPL-3.0-or-later** | **All quality gates green** | tests verified (22,843 workspace, 0 failures) | **~65 JSON-RPC methods** | Wire Standard L3 (partial) | Zero C FFI deps (ecoBin v3.0) | **Zero production panics/expects** | IPC-first | workspace `unsafe_code = "deny"`, **41 crates `forbid`** | **46 unsafe blocks** (all in hw containment, all SAFETY-documented, reconciled S221) | **0 production TODOs** | **rustix 1.x workspace-wide** | **capability-based primal references (no hardcoded names, S221)** | **`async-trait` DEPRECATED** (banned in `deny.toml`) | **`deny.toml` ring + async-trait + zstd-sys bans active** | **BTSP Phase 3 encrypted channel (ChaCha20-Poly1305, S215)** | **BTSP Phase 3 transport switch verified (S218)** | **BTSP handshake bounded + connection-reused** (PG-46 resolved, S214) | **All lint attrs with reason (S211+S213)** | **Auth issuer capability-based (S209)** | **Self-registration with Songbird (S207)** | **Encrypted compute dispatch (Phase 55)** | **Display Phase 2 (petalTongue IPC)** | **BTSP JSON-line relay (Phase 45c)** | **Orchestrator lock-panic-free (S213)** | **Health liveness fast-path (PG-62, S225)**
5+
**Latest**: S225 — PG-62: `health.liveness` fast-path returns `{"status":"starting"}` during initialization, `{"status":"alive"}` once ready. Discovery registration moved after listener spawn. Recommended caller timeout: ≥3s. BearDog `crypto.sign_contract` (PG-60+) tracked for Phase 60+. +5 tests. 22,843 tests, 0 failures.
66

77
---
88

@@ -91,6 +91,7 @@ names directly. Deprecated API definitions retained for backward compatibility o
9191
| Sovereign compiler Phase 4+ | barraCuda team | naga-IR optimizer, register pressure, peepholes |
9292
| barraCuda budding Phases 1-4 | barraCuda team | API audit, SemVer 1.0, Springs rewire |
9393
| ComputeDispatch migration (D-CD) | barraCuda team | 144/280+ done; ~139 remaining; lives in barraCuda crate |
94+
| `crypto.sign_contract` (PG-60+) | BearDog team | Cross-family ionic bond contract signing — expose as JSON-RPC method (proposer, acceptor, capabilities, duration). primalSpring `bonding::ionic_rpc` ready to consume. Phase 60+, no urgency. |
9495

9596
---
9697

README.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Nest = Tower + Storage <- storage
4242
| `cargo fmt --all -- --check` | 0 diffs |
4343
| `cargo clippy --workspace --all-targets -- -D warnings` | 0 warnings |
4444
| `cargo doc --workspace --no-deps` (RUSTDOCFLAGS="-D warnings") | 0 warnings |
45-
| `cargo test --workspace` | **22,838 tests, 0 failures** (7,896+ lib-only), **~222** ignored (hardware-gated); full workspace ~7m |
45+
| `cargo test --workspace` | **22,843 tests, 0 failures** (7,896+ lib-only), **~222** ignored (hardware-gated); full workspace ~7m |
4646
| Doctests | All passing (common, core, server, cli, testing, display) |
4747
| Standalone clone test | Pull to any machine, `cargo test` works (GPU-optional, CPU fallback, device-lost resilient) |
4848
| `unsafe` blocks | **46 actual** (all in hw-safe/GPU/VFIO/display/plugin containment crates); all SAFETY-documented (S204, reconciled S221); workspace `unsafe_code = "deny"`, **41 crates `forbid`** + 5 hw crates with narrow `#[allow(unsafe_code, reason)]`; **all lint attrs have `reason =`** (S211+S213) |
@@ -145,6 +145,16 @@ HDMI Tx V4L2 Rx Serial TransportRouter
145145
- **Storage service integration** -- real JSON-RPC `storage.artifact.store`/`retrieve` with graceful fallback
146146
- **Real-time events**: `compute.status` JSON-RPC polling or biomeOS/coordination service for event streaming
147147

148+
### Health Probe Timeouts (PG-62)
149+
150+
Callers probing `health.liveness` should use a timeout of **≥3 seconds** (recommended: 5s for composition startup). During initialization, `health.liveness` returns `{"status":"starting"}` until the server is fully ready (discovery registered, biomeOS scanned), then transitions to `{"status":"alive"}`. The socket accepts connections immediately upon listener bind — before executor initialization completes — so callers receive a fast response even during cold start. If BTSP handshake is required, add its budget (5s default, overridable via `BTSP_HANDSHAKE_TIMEOUT_SECS`).
151+
152+
| Probe | During init | After ready |
153+
|-------|-------------|-------------|
154+
| `health.liveness` | `{"status":"starting"}` | `{"status":"alive"}` |
155+
| `health.readiness` | `{"status":"starting","version":"..."}` | `{"status":"ready","version":"..."}` |
156+
| `health.check` | Full envelope (always `"alive"`) | Full envelope |
157+
148158
### JSON-RPC Methods (~67 dynamically built; S186)
149159

150160
Surface trimmed to hardware orchestration and IPC boundaries. **Removed from this repo** (S169): `inference.*` / Ollama-style AI (→ intelligence service), **`shader.compile.*`** (→ visualization service), **`science.*`** / **`ecology.*`** / **`discovery.*`** / **`deploy.*`** relays (→ orchestration and peers). **Kept**: **`shader.dispatch`** (dispatch compiled binary to GPU; compile happens in visualization service).
@@ -247,7 +257,7 @@ toadStool/
247257
| Clippy pedantic warnings | 0 (workspace-wide `clippy::pedantic` clean; `#[expect]` evolution S131+) |
248258
| Doc warnings | 0 |
249259
| Build warnings | 0 |
250-
| Workspace tests | **22,833**, 0 failures (7,896+ lib-only) |
260+
| Workspace tests | **22,843**, 0 failures (7,896+ lib-only) |
251261
| Lib-only line coverage | ~83.6% |
252262
| Full workspace test time | ~7m (unlimited parallelism, `cfg!(test)` fast timeouts; GPU crates have NVK resilience wrappers) |
253263
| `unsafe` blocks | **46 actual** (all in hw-safe/GPU/VFIO/display/plugin containment crates); all SAFETY-documented (S204, reconciled S221); workspace `unsafe_code = "deny"`, **41 crates `forbid`** + 5 hw crates with narrow `#[allow(unsafe_code, reason)]` |
@@ -269,12 +279,14 @@ toadStool/
269279
**We are still evolving.** barraCuda (separate primal) owns all math and shaders. ToadStool focuses on hardware discovery, capability probing, and workload orchestration. All 5 spring handoffs absorbed.
270280

271281
### Active / Next
272-
- **Test coverage** -- pushing toward 90% target; 22,833 tests; ~83.6% lib-only line (185K lines instrumented); remaining gap: hardware-dependent paths (VFIO, DRM, V4L2), specialty runtimes
282+
- **Test coverage** -- pushing toward 90% target; 22,843 tests; ~83.6% lib-only line (185K lines instrumented); remaining gap: hardware-dependent paths (VFIO, DRM, V4L2), specialty runtimes
273283
- **DF64 / ComputeDispatch** -- transferred to barraCuda team (S93); toadStool serves hardware capabilities
274284
- **Sovereign compiler Phase 4+** -- register pressure estimation, loop software pipelining (barraCuda)
275285
- **NUCLEUS crypto integration** -- compute payloads encrypted via Tower `crypto.encrypt`/`crypto.decrypt` (S205); **self-registration with Songbird** via `DISCOVERY_SOCKET` + `ipc.register` at startup (S207)
276286

277287
### Recently Completed
288+
- **S225 (May 7, 2026)**: **PG-62 — Health Liveness Fast-Path + Startup Reorder**`health.liveness` now returns `{"status":"starting"}` during initialization and `{"status":"alive"}` once fully ready (via `Arc<AtomicBool>` readiness flag). `health.readiness` likewise returns `"starting"``"ready"`. Discovery registration + biomeOS scan moved **after** listener spawn so the socket accepts connections immediately. Recommended caller timeout: **≥3 seconds** (documented). BearDog cross-family `crypto.sign_contract` (PG-60+) tracked for Phase 60+. +5 tests. 22,843 tests.
289+
- **S224 (May 7, 2026)**: **PG-55 — `--bind` Flag + Localhost Default (Security)** — Added `--bind host:port` CLI flag to `server`/`daemon` subcommands. Default TCP bind changed from `0.0.0.0` to `127.0.0.1`. Priority: CLI `--bind` > `TOADSTOOL_BIND_ADDRESS` env > localhost default. +5 tests. 22,838 tests.
278290
- **S223 (May 6, 2026)**: **Deep Debt — Smart Refactor + Sleep Elimination + Test Speed** — Smart-refactored `btsp/json_line.rs` (905→478 LOC + `relay.rs` 255 + `negotiate.rs` 189; zero files >800 LOC). Eliminated 4 test sleeps: `thread::sleep` → deterministic time manipulation (backdated timestamps, `tokio::time::pause`). Evolved `intrusion.rs` to `tokio::time::Instant` for testable time. Fixed `auto_config` ecosystem discovery blocking 10s on mDNS daemon startup/shutdown and DNS resolution in tests — `cfg!(test)` early-returns skip network I/O (237 tests: 10.02s→0.24s, 41x speedup). Confirmed zero production `unwrap()`. Stale `/tmp/ecoPrimals/discovery/` comments evolved to capability-based language. +12 tests (resource_validator/analysis). 22,833 tests.
279291
- **S222 (May 5, 2026)**: **primalSpring ironGate Audit — Sandbox + Env Expansion + Specs + Discovery** — Gap 2: sandbox `working_dir` override via `trusted_directories` (was hardcoded `/tmp`). Gap 8: `${VAR}`/`$VAR` env expansion in workload TOMLs. `display.composite` + `transport.bridge` specs added (PG-42). Discovery hierarchy documented. `convert_security_context` now parses isolation level. +26 tests. 22,821 tests.
280292
- **S221 (May 4, 2026)**: **Deep Debt — Capability Names + Dep Hygiene + Stub Evolution + Coverage** — All `barraCuda/coralReef` primal names in errors/deprecations evolved to capability-based language. reqwest 0.12→0.13 (ring→aws-lc-rs). `verify_migration_success` dead code fixed. Unsafe count reconciled 49→46. +20 tests (daemon routes 11, zero_config config 9). 22,580 tests.
@@ -355,7 +367,7 @@ See [CHANGELOG.md](CHANGELOG.md) for full session-by-session detail.
355367

356368
| ID | Description | Status |
357369
|----|-------------|--------|
358-
| D-COV | Test coverage → 90% | Active — 22,833 tests; ~83.6% lib-only line (185K instrumented); remaining gap: hardware-dependent paths (VFIO, DRM, V4L2, akida) |
370+
| D-COV | Test coverage → 90% | Active — 22,843 tests; ~83.6% lib-only line (185K instrumented); remaining gap: hardware-dependent paths (VFIO, DRM, V4L2, akida) |
359371
| D-BTSP-PHASE3 | BTSP encrypted post-handshake channel | **RESOLVED** (S215+S218) — ChaCha20-Poly1305 encrypted channel implemented, transport switch verified |
360372

361373
### Resolved (S94b)
@@ -398,7 +410,7 @@ See [DEBT.md](DEBT.md) for full register and evolution paths.
398410

399411
---
400412

401-
**Last Updated**: May 2026 — S224 (PG-55: --bind Flag + Localhost Default). **22,838** workspace tests, 0 failures (7,884+ lib-only). ~83.6% lib-only line coverage (target 90%). **65 JSON-RPC methods** (direct) + semantic registry with **Wire Standard L3** (cost_estimates + operation_dependencies). AGPL-3.0-or-later. Zero C FFI deps (ecoBin v3.0). **46 unsafe blocks** (all in hw-safe/GPU/VFIO/display/plugin containment crates); all SAFETY-documented; workspace `unsafe_code = "deny"`, **41 crates `forbid`** + 5 hw crates with narrow `#[allow(unsafe_code, reason)]`. **Zero production panics/expects**. IPC-first JSON-RPC (dual-socket: `compute.sock` + `compute-tarpc.sock`). Rust 1.85+ (edition 2024, MSRV). **BTSP Phase 3 encrypted channel** (ChaCha20-Poly1305, S215; transport switch verified S218). **PG-46 resolved** (socket reuse, S214). **Self-registration** with Songbird via `DISCOVERY_SOCKET` (S207). **Capability-based discovery compliant** per `CAPABILITY_BASED_DISCOVERY_STANDARD.md` v1.2.
413+
**Last Updated**: May 2026 — S225 (PG-62: Health Liveness Fast-Path + Startup Reorder). **22,843** workspace tests, 0 failures (7,884+ lib-only). ~83.6% lib-only line coverage (target 90%). **65 JSON-RPC methods** (direct) + semantic registry with **Wire Standard L3** (cost_estimates + operation_dependencies). AGPL-3.0-or-later. Zero C FFI deps (ecoBin v3.0). **46 unsafe blocks** (all in hw-safe/GPU/VFIO/display/plugin containment crates); all SAFETY-documented; workspace `unsafe_code = "deny"`, **41 crates `forbid`** + 5 hw crates with narrow `#[allow(unsafe_code, reason)]`. **Zero production panics/expects**. IPC-first JSON-RPC (dual-socket: `compute.sock` + `compute-tarpc.sock`). Rust 1.85+ (edition 2024, MSRV). **BTSP Phase 3 encrypted channel** (ChaCha20-Poly1305, S215; transport switch verified S218). **PG-46 resolved** (socket reuse, S214). **Self-registration** with Songbird via `DISCOVERY_SOCKET` (S207). **Capability-based discovery compliant** per `CAPABILITY_BASED_DISCOVERY_STANDARD.md` v1.2.
402414

403415
---
404416

crates/auto_config/src/ecosystem/discoverer.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -293,13 +293,13 @@ impl EcosystemDiscoverer {
293293
/// Delegates to `toadstool_common::primal_integration::try_discover_via_mdns`
294294
/// which probes `_toadstool._tcp.local.` and filters by capability TXT records.
295295
pub(crate) fn discover_mdns_services() -> HashMap<String, ServiceInfo> {
296+
use toadstool_common::primal_integration::try_discover_via_mdns;
297+
296298
if cfg!(test) {
297299
debug!("Skipping mDNS probing in test mode");
298300
return HashMap::new();
299301
}
300302

301-
use toadstool_common::primal_integration::try_discover_via_mdns;
302-
303303
let mut services = HashMap::new();
304304

305305
let capability_keys = ["discovery", "crypto", "storage", "compute", "orchestration"];

crates/server/benches/jsonrpc_throughput.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
//! JSON-RPC handler throughput benchmarks (parse → dispatch → serialize).
33
44
use std::sync::Arc;
5+
use std::sync::atomic::AtomicBool;
56

67
use criterion::black_box;
78
use criterion::{Criterion, criterion_group, criterion_main};
@@ -17,6 +18,7 @@ fn jsonrpc_handler() -> JsonRpcHandler {
1718
)),
1819
Arc::<str>::from("bench-1.0.0"),
1920
None,
21+
Arc::new(AtomicBool::new(true)),
2022
)
2123
}
2224

crates/server/src/pure_jsonrpc/connection/tests.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
//! Tests for pure JSON-RPC connection handling (`process_request`, TCP, Unix).
33
44
use std::sync::Arc;
5+
use std::sync::atomic::AtomicBool;
56

67
use tokio::io::{AsyncReadExt, AsyncWriteExt};
78
use tokio::net::{TcpListener, TcpStream, UnixStream};
@@ -16,7 +17,7 @@ fn test_handler() -> JsonRpcHandler {
1617
let executor = Arc::new(WorkloadExecutorDispatch::Standalone(
1718
StandaloneExecutor::new(),
1819
));
19-
JsonRpcHandler::new(executor, "test-conn-1.0.0".to_string(), None)
20+
JsonRpcHandler::new(executor, "test-conn-1.0.0".to_string(), None, Arc::new(AtomicBool::new(true)))
2021
}
2122

2223
#[tokio::test]

0 commit comments

Comments
 (0)