Skip to content

Commit 842b5ff

Browse files
westgatewestgate
authored andcommitted
S207: self-registration via DISCOVERY_SOCKET + ipc.register at startup
register_with_discovery() sends ipc.register to Songbird with compute.dispatch + compute.capabilities + unix:// endpoint. Honors DISCOVERY_SOCKET (highest precedence). DaemonServer path now also self-registers. find_by_capability evolved to ipc.find_capability via discovery path. Old function deprecated. 7,842 lib tests, 0 failures. Made-with: Cursor
1 parent d07ff18 commit 842b5ff

11 files changed

Lines changed: 295 additions & 117 deletions

File tree

CONTEXT.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,13 @@ ToadStool is the **Layer 0** hardware substrate that other primals and springs d
2929
- Override: `TOADSTOOL_SOCKET` / `TOADSTOOL_TARPC_SOCKET` env vars
3030
- Family: `compute-{family_id}.sock` / `compute-{family_id}-tarpc.sock`
3131
- **Peer primals**: Resolved at runtime via capability IDs and Unix-socket discovery (e.g. `capability.discover`, `resolve_capability_socket_fallback`) — not hardcoded URLs or legacy per-primal env manifests.
32-
- **Tests**: 20,000+ (7,841 lib-only S206, 0 failures, unlimited parallelism)
32+
- **Tests**: 20,000+ (7,842 lib-only S207, 0 failures, unlimited parallelism)
3333
- **Unsafe**: 49 blocks (all in hw-safe/GPU/VFIO/display/plugin containment, all SAFETY-documented); workspace `unsafe_code = "deny"`, 41 crates `forbid` + 5 hw crates with narrow `#[allow(unsafe_code, reason)]`; all ~40 production `#[allow]` have `reason =` (S206)
3434
- **async-trait**: DEPRECATED — fully removed and banned in `deny.toml` (S203r); transitive only via axum/config/wiggle
3535
- **deny.toml**: `ring` + `async-trait` + `zstd-sys` bans active (ecoBin v3 compliant)
3636
- **Display Phase 2**: `display.present`, `display.subscribe_input`, `display.poll_events` (petalTongue IPC)
3737
- **Encrypted compute dispatch** (Phase 55): Tower `crypto.encrypt`/`crypto.decrypt` for payloads; `DISCOVERY_SOCKET` highest-precedence capability resolution
38+
- **Self-registration** (S207): `ipc.register` to Songbird via `DISCOVERY_SOCKET` at startup — dynamic NUCLEUS membership without restart
3839
- **BTSP**: 13/13 converged — JSON-line relay + NDJSON post-handshake (primalSpring Phase 45c)
3940
- **Dep hygiene**: `test-mocks` off by default (S206); all workspace deps unified
4041
- **Monitoring**: Real host queries via `toadstool_sysmon` + `rustix::fs::statvfs`

DEBT.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,20 @@
11
# Active Technical Debt Register
22

3-
**Date**: April 2026 — S206
3+
**Date**: April 2026 — S207
44
**Philosophy**: Math is universal, precision is silicon. Workarounds are
55
short-term solutions that increase debt. We aim to solve deep debt over
66
iterations, evolving toward vendor-agnostic, capability-based solutions—
77
with production stubs surfacing typed configuration errors and capability
88
guidance, and auth policy driven by explicit environment configuration
99
where applicable.
1010

11+
**S207 (Self-Registration via DISCOVERY_SOCKET)**: Resolved **D-SELF-REGISTRATION**
12+
(`register_with_coordination()` evolved to `register_with_discovery()` — sends
13+
`ipc.register` to Songbird via `DISCOVERY_SOCKET` with `compute.dispatch` +
14+
`compute.capabilities` + `unix://` endpoint. DaemonServer also self-registers.
15+
`find_by_capability` evolved to use `ipc.find_capability` via discovery path.
16+
Old functions deprecated with migration path). 7,842 lib tests, 0 failures.
17+
1118
**S206 (Lint Evolution + Dep Hygiene + Feature Cleanup)**: Resolved **D-LINT-FULL**
1219
(all ~40 bare `#[allow(...)]` in production evolved to `#[allow(..., reason = "...")]`
1320
17 `unsafe_code` modules, ~23 clippy/deprecated/async-fn-in-trait allows), **D-DEP-UNIFIED**

NEXT_STEPS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# ToadStool -- Next Steps
22

3-
**Updated**: April 2026 — S206 (Lint Evolution + Dep Hygiene + Feature Cleanup)
4-
**Status**: Production-grade | Rust edition **2024** (MSRV 1.85) | **AGPL-3.0-or-later** | **All quality gates green** | **7,841 lib-only** tests verified (20,000+ workspace, 0 failures) | **~65 JSON-RPC methods** | Wire Standard L3 (partial) | Zero C FFI deps (ecoBin v3.0) | Zero production unwraps | IPC-first | workspace `unsafe_code = "deny"`, **41 crates `forbid`** | **49 unsafe blocks** (all in hw containment, all SAFETY-documented) | **0 production TODOs** | **rustix 1.x workspace-wide** | **capability-based primal references (no hardcoded names)** | **`async-trait` DEPRECATED** (banned in `deny.toml`) | **`deny.toml` ring + async-trait + zstd-sys bans active** | **env centralized via config structs** | **real Linux sandbox (rustix)** | **real resource metrics (cgroup v2/proc)** | **plugin loading (libloading)** | **binary tarpc framing (MessagePack)** | **BTSP JSON-line relay (Phase 45c)** | **Display Phase 2 (petalTongue IPC)** | **Encrypted compute dispatch (Phase 55)** | **All lint attrs with reason (S206)** | **test-mocks off by default (S206)**
5-
**Latest**: S206Lint Evolution + Dep Hygiene + Feature Cleanup: All ~40 production bare `#[allow(...)]` evolved to `#[allow(..., reason)]` (17 `unsafe_code`, ~23 clippy/deprecated). `humantime-serde`, `rand`, `tokio-util`, `temp-env` unified to workspace in 20+ Cargo.toml files. GPU `spirv`/`jit`/`testing` + testing `integration-tests`/`benchmarks`/`wiremock` stale features and deps removed. `test-mocks` removed from core default features. **7,841 lib-only** tests, 0 failures, clippy clean, fmt clean.
3+
**Updated**: April 2026 — S207 (Self-Registration via DISCOVERY_SOCKET)
4+
**Status**: Production-grade | Rust edition **2024** (MSRV 1.85) | **AGPL-3.0-or-later** | **All quality gates green** | **7,842 lib-only** tests verified (20,000+ workspace, 0 failures) | **~65 JSON-RPC methods** | Wire Standard L3 (partial) | Zero C FFI deps (ecoBin v3.0) | Zero production unwraps | IPC-first | workspace `unsafe_code = "deny"`, **41 crates `forbid`** | **49 unsafe blocks** (all in hw containment, all SAFETY-documented) | **0 production TODOs** | **rustix 1.x workspace-wide** | **capability-based primal references (no hardcoded names)** | **`async-trait` DEPRECATED** (banned in `deny.toml`) | **`deny.toml` ring + async-trait + zstd-sys bans active** | **env centralized via config structs** | **real Linux sandbox (rustix)** | **real resource metrics (cgroup v2/proc)** | **plugin loading (libloading)** | **binary tarpc framing (MessagePack)** | **BTSP JSON-line relay (Phase 45c)** | **Display Phase 2 (petalTongue IPC)** | **Encrypted compute dispatch (Phase 55)** | **All lint attrs with reason (S206)** | **test-mocks off by default (S206)** | **Self-registration with Songbird (S207)**
5+
**Latest**: S207Self-Registration via DISCOVERY_SOCKET: `register_with_coordination()` evolved to `register_with_discovery()` — sends `ipc.register` to Songbird via `DISCOVERY_SOCKET` (highest-precedence, set by `composition_nucleus.sh`). Capabilities: `compute.dispatch` + `compute.capabilities`. Endpoint: `unix:///…/compute.sock` (actual listen path via `resolve_toadstool_socket`). DaemonServer startup now also self-registers. `find_by_capability` evolved to use `ipc.find_capability` via DISCOVERY_SOCKET. Old function deprecated with migration path. **7,842 lib-only** tests, 0 failures, clippy clean, fmt clean.
66

77
---
88

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Nest = Tower + Storage <- storage
4242
| `cargo fmt --all -- --check` | 0 diffs |
4343
| `cargo clippy --workspace --all-targets -- -D warnings` | 0 warnings |
4444
| `cargo doc --workspace --no-deps` (RUSTDOCFLAGS="-D warnings") | 0 warnings |
45-
| `cargo test --workspace` | **20,000+ tests, 0 failures** (7,841 lib-only verified S205), **~93** ignored (hardware-gated); full workspace ~3m30s |
45+
| `cargo test --workspace` | **20,000+ tests, 0 failures** (7,842 lib-only verified S207), **~93** ignored (hardware-gated); full workspace ~3m30s |
4646
| Doctests | All passing (common, core, server, cli, testing, display) |
4747
| Standalone clone test | Pull to any machine, `cargo test` works (GPU-optional, CPU fallback, device-lost resilient) |
4848
| `unsafe` blocks | **49 actual** (all in hw-safe/GPU/VFIO/display/plugin containment crates); all SAFETY-documented (S204); workspace `unsafe_code = "deny"`, **41 crates `forbid`** + 5 hw crates with narrow `#[allow(unsafe_code, reason)]`; **all ~40 production `#[allow]` have `reason =`** (S206) |
@@ -272,7 +272,7 @@ toadStool/
272272
- **Test coverage** -- pushing toward 90% target; 22,000+ tests; ~83.6% lib-only line (185K lines instrumented); remaining gap: hardware-dependent paths, specialty runtimes
273273
- **DF64 / ComputeDispatch** -- transferred to barraCuda team (S93); toadStool serves hardware capabilities
274274
- **Sovereign compiler Phase 4+** -- register pressure estimation, loop software pipelining (barraCuda)
275-
- **NUCLEUS crypto integration** -- compute payloads encrypted via Tower `crypto.encrypt`/`crypto.decrypt` (S205); next: primal self-registration with Songbird (`ipc.register`)
275+
- **NUCLEUS crypto integration** -- compute payloads encrypted via Tower `crypto.encrypt`/`crypto.decrypt` (S205); **self-registration with Songbird** via `DISCOVERY_SOCKET` + `ipc.register` at startup (S207)
276276

277277
### Recently Completed
278278
- **S206 (Apr 28, 2026)**: **Lint Evolution + Dep Hygiene + Feature Cleanup** — All ~40 production bare `#[allow(...)]` evolved to `#[allow(..., reason)]` (17 `unsafe_code`, ~23 clippy/deprecated). `humantime-serde`, `rand`, `tokio-util`, `temp-env` unified to `{ workspace = true }` in 20+ Cargo.toml files. GPU `spirv`/`jit`/`testing` + testing `integration-tests`/`benchmarks`/`wiremock` stale features/deps removed. `test-mocks` removed from core default features (production builds no longer compile mock backends). 7,841 lib tests, 0 failures, clippy clean.

crates/cli/src/daemon/server.rs

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,14 @@ impl DaemonServer {
6666
let workload_manager = WorkloadManager::new(config.max_concurrent_workloads).await?;
6767
info!("✅ Workload manager initialized");
6868

69-
// Phase 4: Resource monitor via system metrics
70-
// Phase 5: Health reporting via coordination service integration
69+
// Self-register with Songbird via DISCOVERY_SOCKET (fire-and-forget)
70+
match toadstool::ipc_helpers::register_with_discovery().await {
71+
Ok(()) => info!("✅ Self-registered with discovery service"),
72+
Err(e) => {
73+
warn!("Could not self-register with discovery service: {e}");
74+
warn!("Operating in standalone mode (no discovery)");
75+
}
76+
}
7177

7278
info!("✅ ToadStool daemon server initialized");
7379

crates/core/toadstool/src/ipc/mod.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,12 @@ pub use server::IpcServer;
3939

4040
// Re-export legacy helpers for backward compatibility
4141
// These will gradually migrate to use the new platform layer
42+
#[expect(
43+
deprecated,
44+
reason = "re-export kept for callers migrating to register_with_discovery"
45+
)]
46+
pub use crate::ipc_helpers::register_with_coordination;
4247
pub use crate::ipc_helpers::{
4348
find_by_capability, get_default_coordination_socket, get_semantic_name, is_semantic_method,
44-
list_semantic_methods, register_with_coordination, resolve_method_name,
49+
list_semantic_methods, register_with_discovery, resolve_method_name,
4550
};

crates/core/toadstool/src/ipc_helpers/connection.rs

Lines changed: 52 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ use tracing::{debug, info};
1212
use crate::{ToadStoolError, ToadStoolResult};
1313
use toadstool_common::constants::PRIMAL_NAME;
1414
use toadstool_common::constants::timeouts;
15+
use toadstool_common::primal_sockets::{
16+
SocketPathEnv, resolve_capability_socket_fallback, resolve_toadstool_socket,
17+
};
1518

1619
use super::framing;
1720

@@ -35,54 +38,49 @@ fn get_runtime_dir() -> String {
3538
})
3639
}
3740

38-
fn resolve_coordination_socket() -> String {
39-
std::env::var("BIOMEOS_COORDINATION_SOCKET")
40-
.or_else(|_| std::env::var("COORDINATION_SOCKET"))
41-
.unwrap_or_else(|_| get_default_coordination_socket())
42-
}
43-
4441
/// Default coordination-capability socket path.
4542
///
4643
/// biomeOS convention: `$XDG_RUNTIME_DIR/biomeos/coordination.sock`
4744
pub fn get_default_coordination_socket() -> String {
4845
format!("{}/biomeos/coordination.sock", get_runtime_dir())
4946
}
5047

51-
/// Register ToadStool with coordination/discovery service
48+
/// Self-register with Songbird via `DISCOVERY_SOCKET` (preferred) or coordination fallback.
49+
///
50+
/// Sends `ipc.register` so Songbird can resolve `toadstool` by capability
51+
/// without the composition launcher doing it on our behalf. Fire-and-forget
52+
/// at the call site — if this fails the primal continues in standalone mode.
5253
///
5354
/// # Errors
5455
///
55-
/// Returns error if the coordination service is unreachable, JSON-RPC framing
56+
/// Returns error if the discovery service is unreachable, JSON-RPC framing
5657
/// fails, or registration is rejected.
57-
pub async fn register_with_coordination() -> ToadStoolResult<()> {
58-
let socket_path = resolve_coordination_socket();
58+
pub async fn register_with_discovery() -> ToadStoolResult<()> {
59+
let env = SocketPathEnv::from_env();
60+
let discovery_path = resolve_capability_socket_fallback("discovery", &env);
61+
let socket_path = discovery_path.to_string_lossy().to_string();
5962

60-
info!("Registering with coordination service at {}", socket_path);
63+
info!("Self-registering with discovery service at {}", socket_path);
6164

62-
let mut stream = timeout(IPC_TIMEOUT, UnixStream::connect(&socket_path))
65+
let mut stream = timeout(IPC_TIMEOUT, UnixStream::connect(discovery_path.as_path()))
6366
.await
64-
.map_err(|_| ToadStoolError::integration("Timeout connecting to coordination service"))?
67+
.map_err(|_| ToadStoolError::integration("Timeout connecting to discovery service"))?
6568
.map_err(|e| {
6669
ToadStoolError::integration(format!(
67-
"Failed to connect to coordination service at {socket_path}: {e}"
70+
"Failed to connect to discovery service at {socket_path}: {e}"
6871
))
6972
})?;
7073

71-
let socket_endpoint = std::env::var("TOADSTOOL_SOCKET").unwrap_or_else(|_| {
72-
let runtime_dir = get_runtime_dir();
73-
format!("{runtime_dir}/biomeos/{PRIMAL_NAME}.sock")
74-
});
74+
let own_socket = resolve_toadstool_socket(&env);
75+
let endpoint = format!("unix://{}", own_socket.display());
7576

7677
let request = json!({
7778
"jsonrpc": toadstool_common::constants::jsonrpc::VERSION,
78-
"method": "capability.register",
79+
"method": "ipc.register",
7980
"params": {
80-
"primal_name": PRIMAL_NAME,
81-
"capabilities": [
82-
"compute", "workload", "orchestration", "ai_local",
83-
"gpu", "wasm", "container", "shader.dispatch"
84-
],
85-
"endpoint": socket_endpoint
81+
"primal_id": PRIMAL_NAME,
82+
"capabilities": ["compute.dispatch", "compute.capabilities"],
83+
"endpoint": endpoint
8684
},
8785
"id": 1
8886
});
@@ -92,37 +90,58 @@ pub async fn register_with_coordination() -> ToadStoolResult<()> {
9290

9391
if let Some(error) = response.get("error") {
9492
return Err(ToadStoolError::integration(format!(
95-
"Coordination service registration failed: {error}"
93+
"Discovery service registration failed: {error}"
9694
)));
9795
}
9896

99-
info!("Successfully registered with coordination service");
97+
info!("Self-registered with discovery service ({})", endpoint);
10098
debug!("Registration response: {:?}", response);
10199

102100
Ok(())
103101
}
104102

105-
/// Find primals by capability via coordination service
103+
/// Register ToadStool with coordination/discovery service.
104+
///
105+
/// Delegates to [`register_with_discovery`], which uses `DISCOVERY_SOCKET`
106+
/// (highest precedence, set by `composition_nucleus.sh` → Songbird) with
107+
/// full fallback through `resolve_capability_socket_fallback("discovery", …)`.
108+
///
109+
/// # Errors
110+
///
111+
/// Returns error if the coordination service is unreachable, JSON-RPC framing
112+
/// fails, or registration is rejected.
113+
#[deprecated(note = "use register_with_discovery — aligns with DISCOVERY_SOCKET + ipc.register")]
114+
pub async fn register_with_coordination() -> ToadStoolResult<()> {
115+
register_with_discovery().await
116+
}
117+
118+
/// Find primals by capability via discovery/coordination service
119+
///
120+
/// Uses `DISCOVERY_SOCKET` (highest precedence) with full fallback chain.
106121
///
107122
/// # Errors
108123
///
109124
/// Returns error if the coordination service is unreachable, the response is
110125
/// invalid, or the query fails.
111126
pub async fn find_by_capability(capability: &str) -> ToadStoolResult<Vec<String>> {
112-
let socket_path = resolve_coordination_socket();
127+
let env = SocketPathEnv::from_env();
128+
let discovery_path = resolve_capability_socket_fallback("discovery", &env);
129+
let socket_path = discovery_path.to_string_lossy().to_string();
113130

114131
debug!("Finding primals with capability: {}", capability);
115132

116-
let mut stream = timeout(IPC_TIMEOUT, UnixStream::connect(&socket_path))
133+
let mut stream = timeout(IPC_TIMEOUT, UnixStream::connect(discovery_path.as_path()))
117134
.await
118-
.map_err(|_| ToadStoolError::integration("Timeout connecting to coordination service"))?
135+
.map_err(|_| ToadStoolError::integration("Timeout connecting to discovery service"))?
119136
.map_err(|e| {
120-
ToadStoolError::integration(format!("Failed to connect to coordination service: {e}"))
137+
ToadStoolError::integration(format!(
138+
"Failed to connect to discovery service at {socket_path}: {e}"
139+
))
121140
})?;
122141

123142
let request = json!({
124143
"jsonrpc": toadstool_common::constants::jsonrpc::VERSION,
125-
"method": "capability.find",
144+
"method": "ipc.find_capability",
126145
"params": {
127146
"capability": capability
128147
},

crates/core/toadstool/src/ipc_helpers/mod.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,13 @@ use crate::semantic_methods::SemanticMethodRegistry;
2323
use std::sync::OnceLock;
2424
use tracing::debug;
2525

26+
#[expect(
27+
deprecated,
28+
reason = "re-export kept for callers migrating to register_with_discovery"
29+
)]
30+
pub use connection::register_with_coordination;
2631
pub use connection::{
27-
find_by_capability, get_default_coordination_socket, register_with_coordination,
32+
find_by_capability, get_default_coordination_socket, register_with_discovery,
2833
};
2934

3035
/// Global semantic method registry (initialized once)

0 commit comments

Comments
 (0)