You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Single proto file**: All services defined in `proto/slurm.proto`. Controller is `SlurmController` (port 6817), agent is `SlurmAgent` (port 6818), accounting is `SlurmAccounting` (port 6819).
46
+
-**Proto files**: `proto/slurm.proto` defines the public API — `SlurmController` (port 6817), `SlurmAgent` (port 6818), `SlurmAccounting` (port 6819). `proto/raft_internal.proto` is separate because Raft consensus is internal controller-to-controller plumbing, not part of the Slurm-compatible API surface that FFI and REST depend on.
46
47
-**State**: Always-on Raft consensus (openraft) in `spurctld/src/raft.rs`. Even single-node deployments run a 1-member Raft cluster. The Raft log is the sole durable store; snapshots are JSON-serialized `ClusterSnapshot` blobs. Recovery happens via Raft log replay + snapshot restore.
47
48
-**Scheduler**: Backfill scheduler in `spur-sched`. Runs every N seconds, assigns pending jobs to idle/mixed nodes.
48
49
-**Job dispatch**: Controller dispatches `LaunchJobRequest` to ALL allocated nodes (not just the first). Each node gets `peer_nodes` list and `task_offset`.
@@ -62,7 +63,8 @@ crates/
62
63
63
64
1. Create a new module in `crates/spur-cli/src/` (see `net.rs` as an example)
64
65
2. Add `mod yourcommand;` to `crates/spur-cli/src/main.rs`
65
-
3. Add dispatch in the `match args[1].as_str()` block
66
+
3. Add symlink dispatch in the `match bin_name` block (for backward-compat invocation via argv[0])
67
+
4. Add native dispatch in the `match args[1].as_str()` block (for `spur <command>` invocation)
66
68
67
69
### Adding a new config section
68
70
@@ -85,14 +87,14 @@ crates/
85
87
- Async runtime: tokio (full features).
86
88
- Logging: `tracing` crate with `tracing-subscriber`.
87
89
- Proto conversion: Each gRPC handler converts between proto types and core types. Conversion helpers live in the same file as the server (e.g., `server.rs` has `proto_to_job_spec`, `job_to_proto`, etc.).
88
-
- Node state machine: `Idle → Mixed → Allocated` based on resource usage; `Down`/`Drain`/`Error` are admin states that override.
89
-
- Job state machine: `Pending → Running → Completing → Completed/Failed/Cancelled`. See `spur-core/src/job.rs`.
90
+
- Node state machine: `Idle`/`Mixed`/`Allocated` based on resource usage; `Down`/`Drain`/`Draining`/`Error`/`Unknown`/`Suspended` are admin/system states that override.
91
+
- Job state machine: `Pending → Running → Completing → Completed/Failed/Cancelled/Timeout/NodeFail/Preempted`. Jobs can also be `Suspended`. See `spur-core/src/job.rs`.
0 commit comments