Skip to content

Commit 377ca58

Browse files
author
BiomeOS Developer
committed
S172: deep debt evolution plan + doc cleanup
6-phase deep debt execution: - Phase 1: Evolved 6 distributed stubs, feature-gated tarpc, typed CUDA errors - Phase 2: CapabilityDomain enum (7 variants), sysfs discovery, port evolution - Phase 3: LockedMemory RAII, typed ioctl wrappers, BYOB health loop wired - Phase 4: Smart-refactored 3 large files into submodules - Phase 5: memmap2 migration, eliminated 4 unsafe blocks in safe_mmap.rs - Phase 6: +55 tests across hw_learn handlers (0%→80%+) and transport (2%→comprehensive) Doc updates: README, CHANGELOG, DEBT.md, NEXT_STEPS, DOCUMENTATION → S172. 21,500+ tests, 0 failures. ~22 irreducible unsafe ops. Made-with: Cursor
1 parent 640b89a commit 377ca58

67 files changed

Lines changed: 2528 additions & 1056 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,43 @@ All notable changes to ToadStool will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [Unreleased] - April 1, 2026 (Sessions 43-171)
8+
## [Unreleased] - April 2, 2026 (Sessions 43-172)
9+
10+
### Session S172 (Apr 2, 2026) — Deep Debt Evolution Plan (6 Phases)
11+
12+
#### Phase 1: Production stubs → real implementations
13+
- Evolved 6 distributed/ stubs: `validate_delegation_proof` (crypto_lock), `CachedResult` with TTL, `CloudCostTracker`/`CloudPerformanceTracker`, `update_node_health`, `UniversalJobProcessor::new()`
14+
- Feature-gated `TRpcTransport::send_message` behind `tarpc-transport` feature
15+
- Evolved CUDA "not implemented" to typed `ToadStoolError::runtime` with alternatives
16+
17+
#### Phase 2: Hardcoding → capability-based
18+
- Created `CapabilityDomain` enum (7 variants: Security, Coordination, Storage, Compute, Routing, Intelligence, Monitoring) with `from_label()` for legacy primal name resolution
19+
- Replaced ~30 hardcoded primal name sites across `capability_helpers.rs`, `paths.rs`, `ecosystem/types.rs`
20+
- Routed hardcoded sysfs paths (`/dev/dri/card0`, PCI BDF, `/etc/hostname`) through `toadstool_sysmon` discovery
21+
- Created `toadstool_sysmon::system::hostname()` module
22+
- Migrated legacy fallback ports to `resolve_env_port()` helper
23+
24+
#### Phase 3: Unsafe evolution
25+
- Created `LockedMemory` RAII type in hw-safe (AlignedAlloc + mlock/munlock, 5 tests)
26+
- Replaced generic ioctl dispatch with typed helper functions in nvpmu/vfio.rs
27+
- Wired BYOB `monitor_deployment_health` into background `tokio::spawn` task
28+
- Evolved embedded placeholder macros with clearer `embedded-placeholder-impls` vs `embedded-hw` feature gating
29+
30+
#### Phase 4: Smart refactoring (3 large files → submodules)
31+
- `cli/daemon/jsonrpc_server.rs` → extracted route handlers into `routes.rs`
32+
- `core/toadstool/runtime.rs` → extracted engine management into `runtime/engine_registry.rs`
33+
- `core/toadstool/byob/byob_impl/mod.rs` → extracted deployment lifecycle into `deployment_lifecycle.rs`
34+
35+
#### Phase 5: memmap2 migration
36+
- Replaced hand-rolled `rustix::mm::mmap`/`munmap` in `hw-safe/safe_mmap.rs` with `memmap2::MmapRaw`
37+
- Eliminated 4 unsafe blocks: mmap syscall (×2 paths), manual munmap Drop, unsafe Send/Sync impls
38+
- Only 1 irreducible unsafe remains in safe_mmap.rs (`VolatileMmio::new`)
39+
- Removed `map_with_flags`/`map_file` (unused externally); `MmapFailed` source → `std::io::Error`
40+
41+
#### Phase 6: Coverage expansion
42+
- Added tests for 5 hw_learn handler files (apply, observe_distill, share_recipe, status, telemetry)
43+
- Added 18 tests to `handler/transport.rs` (LoopbackTransport, happy path streaming, error paths)
44+
- Fixed `ServiceType::from_capability("routing")` regression in ecosystem types test
945

1046
### Session S171 (Apr 1, 2026) — Ember Absorption + Unsafe Evolution + Deep Debt
1147

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ lto = true
214214

215215
# Production code quality: warnings as errors
216216
[workspace.lints.rust]
217+
unsafe_code = "deny"
217218

218219
# Pedantic + Nursery clippy enabled for production-grade code quality
219220
[workspace.lints.clippy]

DEBT.md

Lines changed: 38 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Active Technical Debt Register
22

3-
**Date**: April 1, 2026 — S171
3+
**Date**: April 2, 2026 — S172
44
**Philosophy**: Math is universal, precision is silicon. Workarounds are
55
short-term solutions that increase debt. We aim to solve deep debt over
66
iterations, evolving toward vendor-agnostic, capability-based solutions.
@@ -19,18 +19,44 @@ MOS 6502 / Z80 emulator trait impls return `EmbeddedEmulatorPlaceholder` errors.
1919
Evolve when cycle-accurate CPU cores are implemented.
2020
Files: `embedded/emulator_impls.rs`, `embedded/emulators.rs`.
2121

22-
### D-IOCTL-TYPED
23-
**Crate**: `akida-driver`, `nvpmu` | VFIO ioctl dispatch uses generic `DmaIoctl<OP, T>`.
24-
Evolve to typed per-operation dispatch modules for stronger compile-time safety.
22+
## S172 Resolved Debt
2523

26-
### D-LOCKED-MEMORY
27-
**Crate**: `nvpmu` | DMA buffers use separate alloc + mlock/munlock.
28-
Compose `AlignedAlloc` + `mlock`/`munlock` into a single `LockedMemory` RAII type.
24+
### D-IOCTL-TYPED — RESOLVED S172
25+
Replaced generic ioctl dispatch with typed helper functions in `nvpmu/src/vfio.rs` (`vfio_get_api_version`, `vfio_group_get_status`, `vfio_device_get_bar0_info`). Stronger compile-time safety.
2926

30-
### D-BYOB-HEALTH-LOOP
31-
**Crate**: `core/toadstool` | `byob/byob_impl/mod.rs`
32-
`monitor_deployment_health` and `perform_health_check` are complete implementations
33-
but not yet wired into a production background loop. Phase 2+ integration.
27+
### D-LOCKED-MEMORY — RESOLVED S172
28+
Created `LockedMemory` RAII type in `hw-safe` composing `AlignedAlloc` + `rustix::mm::mlock`/`munlock`. Includes `Send`/`Sync`, `Drop`-based `munlock`, page-aligned convenience constructor. 5 tests.
29+
30+
### D-BYOB-HEALTH-LOOP — RESOLVED S172
31+
Wired `monitor_deployment_health` into a background `tokio::spawn` task. Added `health_handles: Arc<RwLock<HashMap<Uuid, JoinHandle<()>>>>` to `ByobComputeExecutor`. `deploy_biome` spawns health monitor; `stop_deployment` aborts it.
32+
33+
### Deep debt evolution (S172 Plan)
34+
- **D-IOCTL-TYPED-S172**: Replaced generic ioctl dispatch in `nvpmu/vfio.rs` with typed helper functions for stronger compile-time safety.
35+
- **D-LOCKED-MEMORY-S172**: Created `LockedMemory` RAII type in `hw-safe` composing `AlignedAlloc` + `mlock`/`munlock`.
36+
- **D-BYOB-HEALTH-S172**: Wired `monitor_deployment_health` into background `tokio::spawn` task with `JoinHandle` tracking.
37+
- **D-EMBEDDED-EVOLVE-S172**: Evolved embedded placeholder macros with clearer feature gating (`embedded-placeholder-impls` vs `embedded-hw`).
38+
39+
### Production stubs evolved
40+
- **D-STUBS-DISTRIBUTED-S172**: Evolved 6 production stubs in `distributed/` to real implementations: `validate_delegation_proof` (crypto_lock), `CachedResult` with TTL (crypto_lock cache), `CloudCostTracker`/`CloudPerformanceTracker` (cloud scheduling), `update_node_health` (songbird registry), `UniversalJobProcessor` with `new()` constructor.
41+
- **D-TARPC-GATE-S172**: Gated `TRpcTransport::send_message` stub behind `#[cfg(feature = "tarpc-transport")]` feature flag.
42+
- **D-CUDA-ERRORS-S172**: Evolved CUDA "not implemented" runtime error to typed `ToadStoolError::runtime` with operation name and alternative suggestions.
43+
44+
### Hardcoding elimination
45+
- **D-CAPABILITY-DOMAIN-S172**: Created `CapabilityDomain` enum with 7 variants (Security, Coordination, Storage, Compute, Routing, Intelligence, Monitoring). `from_label()` resolves legacy primal names. Replaced ~30 hardcoded primal name sites across `capability_helpers.rs`, `paths.rs`, `ecosystem/types.rs`.
46+
- **D-SYSFS-DISCOVERY-S172**: Routed hardcoded `/dev/dri/card0` and PCI BDF paths through `toadstool_sysmon::gpu::discover_gpus()` and `GpuDevice::card_path()`. Hostname resolution via new `toadstool_sysmon::system::hostname()`.
47+
- **D-FALLBACK-PORTS-S172**: Migrated legacy fallback port constants to `resolve_env_port()` helper in `primal_discovery_complete`.
48+
49+
### Unsafe reduction
50+
- **D-MEMMAP2-S172**: Replaced hand-rolled `rustix::mm::mmap`/`munmap` in `hw-safe/safe_mmap.rs` with `memmap2::MmapRaw`. Eliminated 4 unsafe blocks (mmap syscall ×2, manual munmap Drop, unsafe Send/Sync impls). Only 1 irreducible unsafe remains (`VolatileMmio::new`).
51+
52+
### Smart refactoring (files >600L → coherent submodules)
53+
- **D-REFACTOR-JSONRPC-S172**: `cli/daemon/jsonrpc_server.rs` → extracted route handlers into `routes.rs`.
54+
- **D-REFACTOR-RUNTIME-S172**: `core/toadstool/runtime.rs` → extracted engine management into `runtime/engine_registry.rs`.
55+
- **D-REFACTOR-BYOB-S172**: `core/toadstool/byob/byob_impl/mod.rs` → extracted deployment lifecycle into `deployment_lifecycle.rs`.
56+
57+
### Coverage expansion
58+
- **D-COV-HWLEARN-S172**: Added tests for 5 hw_learn handler files (apply, observe_distill, share_recipe, status, telemetry) — all from 0% to 80%+.
59+
- **D-COV-TRANSPORT-S172**: Added 18 tests to `handler/transport.rs` — from 2% to comprehensive coverage.
3460

3561
## S171 Resolved Debt
3662

@@ -409,7 +435,7 @@ dependencies, works on every GPU, ships with the crate, testable in CI without h
409435
|----|-------------|----------|-------|
410436
| D-NPU | ~~NpuDispatch trait~~ | **RESOLVED S94** | `toadstool-core::npu_dispatch` — generic `NpuDispatch` trait + `AkidaNpuDispatch` adapter |
411437
| D-RING | ~~ring C FFI in dev-deps~~ | **RESOLVED S97** | `reqwest` removed from integration-tests; `zstd``ruzstd` (pure Rust) |
412-
| D-COV | Test coverage → 90% | Medium | **~84-85% line coverage** (187K lines, llvm-cov). **21,514+ tests passing**. Target 90%. **S161**: +10 large file refactors, stubs evolved, unsafe reduced; expanded coverage for `byob_impl`, `agent_backend`, `auto_init`. Remaining gaps e.g. `science_domains.rs`. Push ongoing. |
438+
| D-COV | Test coverage → 90% | Medium | **~84-85% line coverage** (187K lines, llvm-cov). **21,500+ tests passing**. Target 90%. **S172**: +55 tests (hw_learn handlers 0%→80%+, transport handler 2%→comprehensive). **S161**: +10 large file refactors, stubs evolved, unsafe reduced. Remaining gaps: hardware-dependent paths. Push ongoing. |
413439
| D-DOCS | ~~Fill missing_docs warnings~~ | **RESOLVED S159** | All 694+ missing doc warnings filled across 58 crates. `clippy --workspace -D warnings` passes. |
414440
| D-SOV | ~~Sovereignty: primal-name → capability~~ | **RESOLVED S94b** | All production callers migrated to `get_socket_path_for_capability()`. Deprecated definitions retained for fallback only. |
415441
| D-WC | ~~Wildcard re-exports remaining~~ | **RESOLVED S132** | 4 high-traffic crates narrowed to explicit exports (constants, distributed, ipc, universal_adapter). Remaining wildcards justified (15+ items all used, or private submodule re-exports). |

DOCUMENTATION.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ToadStool Documentation Hub
22

3-
**Last Updated**: April 1, 2026 — S171
3+
**Last Updated**: April 2, 2026 — S172
44

55
---
66

@@ -30,14 +30,15 @@ These root documents were **fully resolved** and **fossilized** in wateringHole
3030

3131
---
3232

33-
## Current State (S171 — April 1, 2026)
33+
## Current State (S172 — April 2, 2026)
3434

3535
**Post-budding, dependency-sovereign, IPC-first, fully concurrent.** barraCuda is a separate primal at `ecoPrimals/barraCuda/`. ToadStool is the hardware infrastructure layer — GPU/NPU/CPU discovery, capability probing, workload orchestration, and shader dispatch.
3636

37-
- **21,700+ tests**, 0 failures, 0 clippy warnings. Full workspace concurrent test suite.
37+
- **21,500+ tests**, 0 failures, 0 clippy warnings. Full workspace concurrent test suite.
3838
- **~65 JSON-RPC methods**. IPC compliant (`health.liveness/readiness/check`, `capabilities.list`, socket at `$XDG_RUNTIME_DIR/biomeos/toadstool.sock`).
3939
- **glowPlug/ember subsystem** — toadStool-native hardware lifecycle (absorbed from coralReef). `toadstool-glowplug`, `toadstool-ember`, `toadstool-hw-safe` crates.
40-
- **~26 irreducible unsafe ops** — all in `hw-safe` + drivers, with `// SAFETY:` comments. 23 crates forbid, 20 deny `unsafe_code`.
40+
- **~22 irreducible unsafe ops** — all in `hw-safe` + drivers, with `// SAFETY:` comments. 23 crates forbid, 20 deny `unsafe_code`.
41+
- **memmap2 migration** — replaced hand-rolled mmap/munmap in hw-safe with memmap2, eliminating 4 unsafe blocks.
4142
- **ecoBin v3.0** — Zero C FFI deps. Crypto delegated to BearDog. HTTP delegated to Songbird.
4243
- **Capability-based discovery** — Primals discover each other by capability, not name. Self-knowledge principle.
4344
- **Fully concurrent tests** — All tests run with `--test-threads=8`. Zero `#[serial]`. Zero fixed sleeps in non-chaos tests.

NEXT_STEPS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# ToadStool/BarraCuda -- Next Steps
22

3-
**Updated**: April 1, 2026 -- S171 Ember Absorption + Unsafe Evolution + Deep Debt
4-
**Status**: Production-grade | Rust edition **2024** (MSRV 1.85) | **AGPL-3.0-only** | **All quality gates green** | 21,700+ tests (0 failures) | **~65 JSON-RPC methods** | Zero C FFI deps (ecoBin v3.0) | Zero production unwraps | IPC-first | **43/43 crates with `unsafe_code` lint policy** (23 forbid + 20 deny) | All production files < 400L | **glowPlug/ember** absorbed from coralReef — toadStool-native hardware lifecycle | **~26 irreducible unsafe** ops in `hw-safe` + drivers | **IPC compliant** (health.liveness/readiness/check, capabilities.list, XDG socket)
5-
**Latest**: S171Created `toadstool-hw-safe` (unsafe containment zone), `toadstool-glowplug`, `toadstool-ember` crates. Rewrote `GpuFirmwareProxy``GpuFirmwareAccess` (direct BAR0 reads). Evolved `glowplug_client.rs` to toadStool-native sysfs service. Migrated mmap/alloc to hw-safe. All ~400 distributed missing_docs resolved. Hardcoding evolved (bind address, gate ID, configurator).
3+
**Updated**: April 2, 2026 -- S172 Deep Debt Evolution
4+
**Status**: Production-grade | Rust edition **2024** (MSRV 1.85) | **AGPL-3.0-only** | **All quality gates green** | 21,500+ tests (0 failures) | **~65 JSON-RPC methods** | Zero C FFI deps (ecoBin v3.0) | Zero production unwraps | IPC-first | **43/43 crates with `unsafe_code` lint policy** (23 forbid + 20 deny) | All production files < 400L | **glowPlug/ember** absorbed from coralReef — toadStool-native hardware lifecycle | **~22 irreducible unsafe** ops in `hw-safe` + drivers | **IPC compliant** (health.liveness/readiness/check, capabilities.list, XDG socket)
5+
**Latest**: S172Deep debt evolution: CapabilityDomain enum, LockedMemory RAII, memmap2 migration (4 unsafe blocks eliminated), typed ioctl wrappers, BYOB health loop, 3 large file refactors, +55 hw_learn/transport tests, sysfs discovery helpers.
66

77
---
88

@@ -33,7 +33,7 @@ syntax fixed in 3 server files. Test suite fully unblocked.
3333

3434
### P1: Test Coverage → 90% (D-COV) — Ongoing (S164)
3535

36-
**~80% line coverage** (lib-only, 185K lines instrumented). **21,700+ tests** (0 failures). Target 90%.
36+
**~80% line coverage** (lib-only, 185K lines instrumented). **21,500+ tests** (0 failures). Target 90%. S172 added 55+ tests across hw_learn handlers and transport handler.
3737

3838
**S164** expanded coverage with **+94 new tests** across 7 low-coverage files:
3939
- `resource_validator.rs` 20% → ~75% (+19 tests)

0 commit comments

Comments
 (0)