Skip to content

Commit 5359cf3

Browse files
MalteJclaude
andcommitted
Update CLAUDE.md to reflect current distributed architecture
Add documentation for mvirt-api (Raft control plane), mvirt-node (reconciliation agent), mvirt-ebpf (eBPF networking), mvirt-ui (web dashboard), and NixOS build system. Consolidate gRPC services table and streamline logging examples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent a9be47a commit 5359cf3

8 files changed

Lines changed: 404 additions & 179 deletions

File tree

CLAUDE.md

Lines changed: 118 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,47 @@ See [README.md](README.md) for build commands and project overview.
66

77
```
88
mvirt/
9-
├── mvirt-cli/ # CLI client + TUI
10-
├── mvirt-vmm/ # Daemon (VM Manager)
11-
├── mvirt-log/ # Centralized logging service
12-
├── mvirt-zfs/ # ZFS storage management
13-
├── mvirt-one/ # MicroVM Init System for isolated Pods
14-
│ ├── src/ # Rust init process (PID 1)
15-
│ ├── proto/ # one gRPC API
9+
├── mvirt-api/ # Raft-based distributed control plane
10+
│ ├── src/
11+
│ │ ├── grpc/ # gRPC NodeService server
12+
│ │ ├── rest/ # REST API (handlers per resource)
13+
│ │ ├── store/ # Raft storage, event sourcing
14+
│ │ ├── state.rs # State machine + command handling
15+
│ │ └── scheduler.rs
16+
│ └── proto/ # node.proto
17+
├── mvirt-node/ # Node agent (reconciles desired state from API)
18+
│ ├── src/
19+
│ │ ├── clients/ # gRPC clients for local services (vmm, ebpf, zfs)
20+
│ │ └── reconciler/ # Per-resource reconcilers (vm, nic, volume, network, route, security_group, template)
21+
├── mvirt-vmm/ # Local hypervisor daemon (VM + Pod management)
22+
│ ├── src/
23+
│ │ ├── grpc.rs # VmService implementation
24+
│ │ ├── hypervisor.rs # cloud-hypervisor process management
25+
│ │ ├── pod_service.rs # PodService implementation
26+
│ │ └── store.rs # SQLite state
27+
│ └── proto/ # mvirt.proto (VmService + PodService)
28+
├── mvirt-ebpf/ # eBPF-based networking (replaces mvirt-net)
29+
│ ├── src/
30+
│ │ ├── ebpf_loader.rs # eBPF program loading
31+
│ │ ├── proto_handler.rs # IPv4/IPv6/DHCP/ICMP handling
32+
│ │ ├── tap.rs # TAP device management
33+
│ │ └── nat.rs # NAT + conntrack
34+
│ └── programs/ # eBPF kernel-space programs
35+
├── mvirt-zfs/ # ZFS storage daemon
36+
│ └── proto/ # zfs.proto
37+
├── mvirt-log/ # Centralized audit logging service
38+
│ └── proto/ # log.proto
39+
├── mvirt-one/ # MicroVM Init System (PID 1 for Pods)
40+
│ ├── src/
41+
│ ├── proto/ # one.proto
1642
│ └── initramfs/ # rootfs skeleton
17-
├── proto/ # gRPC API definition
18-
└── images/ # Kernel and disk images (not in git)
43+
├── mvirt-cli/ # CLI client + TUI (ratatui)
44+
├── mvirt-ui/ # Web UI (React + Vite + Tailwind)
45+
├── nix/ # NixOS modules, packages, images
46+
│ ├── modules/ # mvirt.nix (service definitions)
47+
│ ├── packages/ # Build derivations
48+
│ └── images/ # hypervisor.nix, node.nix
49+
└── proto/ # Legacy shared proto (per-crate protos preferred)
1950
```
2051

2152
## Code Quality
@@ -29,143 +60,119 @@ No warnings allowed.
2960

3061
## Architecture
3162

32-
### gRPC API (proto/mvirt.proto)
33-
- `CreateVm`, `GetVm`, `ListVms`, `DeleteVm` - CRUD
34-
- `StartVm`, `StopVm`, `KillVm` - Lifecycle
35-
- `Console` - Bidirectional streaming for serial console
63+
### Distributed Control Plane (mvirt-api)
3664

37-
### SQLite Schema (store.rs)
38-
- `vms` table: VM definitions (id, name, state, config_json, timestamps)
39-
- `vm_runtime` table: Runtime info for running VMs (pid, sockets)
65+
Raft-based consensus for multi-node cluster management.
66+
67+
- **Raft** via `mraft` for leader election and log replication
68+
- **REST API** on port 8080 for external clients and UI
69+
- **gRPC NodeService** on port 50056 for node agents
70+
- **Raft** inter-node communication on port 6001
71+
- **Event-sourced state machine** (`state.rs`) processes all commands
72+
- **Scheduler** assigns resources to nodes
73+
74+
REST handlers are split per resource: `controlplane.rs`, `nodes.rs`, `vms.rs`, `networks.rs`, `nics.rs`.
75+
76+
### Node Agent (mvirt-node)
77+
78+
Runs on each hypervisor node, connects to mvirt-api.
79+
80+
- Registers with the API, sends heartbeats
81+
- Watches spec stream (`WatchSpecs`) for desired state
82+
- Reconciles locally via per-resource reconcilers (VM, NIC, volume, network, route, security group, template)
83+
- Reports status back via `UpdateResourceStatus`
84+
85+
### Local Hypervisor (mvirt-vmm)
86+
87+
gRPC services: **VmService** + **PodService** on port 50051.
4088

41-
### Hypervisor (hypervisor.rs)
4289
- Spawns cloud-hypervisor processes
4390
- Creates TAP devices and attaches to bridge
4491
- Generates cloud-init ISO
4592
- Background watcher monitors child processes
4693
- `recover_vms()` on daemon startup cleans up stale VMs
94+
- Pod lifecycle via vsock communication with mvirt-one
95+
96+
### eBPF Networking (mvirt-ebpf)
4797

48-
### TUI (tui.rs)
49-
- Non-blocking async with channels
50-
- Background worker handles gRPC calls
51-
- Auto-refresh every 2 seconds
98+
Replaces legacy mvirt-net. In-kernel packet processing via eBPF.
99+
100+
- IPv4/IPv6 routing, DHCP, ICMP handling
101+
- NAT and connection tracking
102+
- TAP device management
103+
- Security groups
104+
105+
### ZFS Storage (mvirt-zfs)
106+
107+
gRPC **ZfsService** for volume management.
108+
109+
- Volume CRUD and resize
110+
- Snapshot management
111+
- Template import with progress tracking (HTTP/file)
112+
- Clone from templates
52113

53114
### Event Logging (mvirt-log)
54115

55-
Centralized audit log for all mvirt components. Logs are stored in SQLite with many-to-many object relations.
116+
Centralized audit log. SQLite with many-to-many object relations.
56117

57118
**Schema:**
58119
```sql
59120
logs (id INTEGER PRIMARY KEY, timestamp_ns, message, level, component)
60-
log_objects (log_id, object_id) -- junction table for many-to-many
61-
```
62-
63-
**Proto API** (`mvirt-log/proto/log.proto`):
64-
```protobuf
65-
service LogService {
66-
rpc Log(LogRequest) returns (LogResponse); // append log
67-
rpc Query(QueryRequest) returns (stream LogEntry); // query logs
68-
}
69-
70-
message LogEntry {
71-
string id = 1;
72-
int64 timestamp_ns = 2;
73-
string message = 3;
74-
LogLevel level = 4; // INFO, WARN, ERROR, DEBUG, AUDIT
75-
string component = 5; // "vmm", "zfs", "cli"
76-
repeated string related_object_ids = 6; // ["vm-123", "vol-456"]
77-
}
78-
```
79-
80-
**Dependency** (in component's Cargo.toml):
81-
```toml
82-
mvirt-log = { path = "../mvirt-log" }
121+
log_objects (log_id, object_id) -- junction table
83122
```
84123

85-
**Logging from components:**
124+
**AuditLogger Usage** (recommended):
86125
```rust
87-
use mvirt_log::{LogServiceClient, LogEntry, LogLevel, LogRequest};
88-
89-
// Connect to mvirt-log
90-
let mut client = LogServiceClient::connect("http://[::1]:50052").await?;
91-
92-
// Log an event with related objects
93-
client.log(LogRequest {
94-
entry: Some(LogEntry {
95-
message: "VM started".into(),
96-
level: LogLevel::Info as i32,
97-
component: "vmm".into(),
98-
related_object_ids: vec![vm_id.clone()],
99-
..Default::default() // id and timestamp auto-generated
100-
}),
101-
}).await?;
102-
103-
// Log event related to multiple objects (e.g., disk attached to VM)
104-
client.log(LogRequest {
105-
entry: Some(LogEntry {
106-
message: "Volume attached".into(),
107-
level: LogLevel::Audit as i32,
108-
component: "zfs".into(),
109-
related_object_ids: vec![vm_id, volume_id], // indexed under both
110-
..Default::default()
111-
}),
112-
}).await?;
126+
use mvirt_log::{AuditLogger, LogLevel, create_audit_logger};
127+
128+
let audit = create_audit_logger("http://[::1]:50052", "vmm");
129+
audit.log(LogLevel::Audit, "VM created", vec![vm_id]).await;
113130
```
114131

115-
**Querying logs:**
116-
```rust
117-
use mvirt_log::{LogServiceClient, QueryRequest};
132+
### MicroVM Init (mvirt-one)
118133

119-
let mut stream = client.query(QueryRequest {
120-
object_id: Some("vm-123".into()),
121-
limit: 100,
122-
..Default::default()
123-
}).await?.into_inner();
134+
PID 1 init process running inside MicroVMs for container pods. Provides OneService gRPC for container lifecycle, log streaming, exec sessions.
124135

125-
while let Some(entry) = stream.message().await? {
126-
println!("{}: {}", entry.timestamp_ns, entry.message);
127-
}
128-
```
136+
### TUI (mvirt-cli)
129137

130-
### Log-Level Guidelines
138+
ratatui-based terminal UI with non-blocking async and background gRPC workers.
131139

132-
| Level | Verwendung | Beispiele |
133-
|-------|------------|-----------|
134-
| **AUDIT** | Alle State-ändernden Operationen (CRUD + Lifecycle) | VM/Volume/NIC created/deleted/started/stopped/killed |
135-
| **INFO** | Informative Events ohne State-Änderung | Service gestartet, Connection hergestellt |
136-
| **WARN** | Degraded Operations, Retries | Connection retry, Fallback aktiviert |
137-
| **ERROR** | Fehlgeschlagene Operationen | VM start failed, Import failed |
138-
| **DEBUG** | Entwickler-Diagnostik | Detaillierte Trace-Infos |
140+
### Web UI (mvirt-ui)
139141

140-
**AuditLogger Usage** (empfohlen statt direktem Client):
141-
```rust
142-
use mvirt_log::{AuditLogger, LogLevel, create_audit_logger};
142+
React + Vite + Tailwind dashboard. Communicates with mvirt-api REST API on port 8080.
143143

144-
// In main.rs: Create shared audit logger
145-
let audit = create_audit_logger("http://[::1]:50052", "vmm");
144+
## gRPC Services
146145

147-
// Automatisch: Dual-Logging (lokal via tracing + remote via mvirt-log)
148-
audit.log(LogLevel::Audit, "VM created", vec![vm_id]).await;
149-
```
146+
| Proto | Service | Port | Key RPCs |
147+
|-------|---------|------|----------|
148+
| `mvirt-api/proto/node.proto` | NodeService | 50056 | Register, Heartbeat, Deregister, WatchSpecs, UpdateResourceStatus |
149+
| `mvirt-vmm/proto/mvirt.proto` | VmService, PodService | 50051 | CreateVm, StartVm, StopVm, Console, PodLogs, PodExec |
150+
| `mvirt-one/proto/one.proto` | OneService | (vsock) | CreatePod, StartPod, StopPod, Logs, Exec |
151+
| `mvirt-zfs/proto/zfs.proto` | ZfsService | 50053 | CreateVolume, ImportTemplate, CreateSnapshot, CloneFromTemplate |
152+
| `mvirt-log/proto/log.proto` | LogService | 50052 | Log, Query |
150153

151-
**Component-specific wrappers** (in `mvirt-zfs`, `mvirt-net`):
152-
```rust
153-
// ZfsAuditLogger wraps AuditLogger with domain-specific methods
154-
use crate::audit::{create_audit_logger, ZfsAuditLogger};
154+
## Log-Level Guidelines
155155

156-
let audit = create_audit_logger(&args.log_endpoint);
157-
audit.volume_created(&volume_id, &volume_name, size).await;
158-
```
156+
| Level | Usage | Examples |
157+
|-------|-------|---------|
158+
| **AUDIT** | State-changing operations (CRUD + Lifecycle) | VM/Volume/NIC created/deleted/started/stopped |
159+
| **INFO** | Informational events without state change | Service started, connection established |
160+
| **WARN** | Degraded operations, retries | Connection retry, fallback activated |
161+
| **ERROR** | Failed operations | VM start failed, import failed |
162+
| **DEBUG** | Developer diagnostics | Detailed trace info |
159163

160164
## Key Decisions
161165

162-
- **SQLite** for persistence (not in-memory)
166+
- **Raft consensus** for distributed cluster state (mraft)
167+
- **Event sourcing** in mvirt-api state machine
168+
- **Reconciliation loop** pattern (desired state vs actual state)
169+
- **SQLite** for local persistence
163170
- **cloud-hypervisor** instead of QEMU
164-
- **TAP + bridge** networking
165-
- **Async channels** in TUI
166-
- **Console escape**: Ctrl+a t
171+
- **eBPF** for in-kernel networking (replaces bridge-based mvirt-net)
167172
- **musl** for static linking
168173
- **Direct kernel boot** for MicroVMs
174+
- **NixOS** for reproducible builds and deployment (flake.nix + crane)
175+
- **Console escape**: Ctrl+a t
169176

170177
## Common Issues
171178

mvirt-api/src/grpc/server.rs

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,14 +80,23 @@ impl NodeServiceImpl {
8080
*rev
8181
}
8282

83-
/// Send a spec event to a specific node.
83+
/// Send a spec event to a specific node (by ID or name).
8484
pub async fn send_spec_to_node(&self, node_id: &str, event: SpecEvent) -> Result<(), Status> {
8585
let senders = self.spec_senders.read().await;
86+
// Try direct ID lookup first
8687
if let Some(sender) = senders.get(node_id) {
8788
sender.send(Ok(event)).await.map_err(|_| {
8889
Status::internal(format!("Failed to send spec to node {}", node_id))
8990
})?;
91+
return Ok(());
9092
}
93+
// Try resolving name to ID
94+
if let Ok(Some(node)) = self.store.get_node_by_name(node_id).await
95+
&& let Some(sender) = senders.get(&node.id) {
96+
sender.send(Ok(event)).await.map_err(|_| {
97+
Status::internal(format!("Failed to send spec to node {}", node_id))
98+
})?;
99+
}
91100
Ok(())
92101
}
93102

@@ -124,7 +133,7 @@ impl NodeServiceImpl {
124133
use super::proto::{
125134
NetworkSpec as ProtoNetworkSpec, NicSpec as ProtoNicSpec, ResourceMeta, SpecEvent,
126135
SpecEventType, VmDesiredState as ProtoVmDesiredState, VmSpec as ProtoVmSpec,
127-
spec_event,
136+
VolumeSpec as ProtoVolumeSpec, spec_event,
128137
};
129138

130139
let revision = self.next_revision().await;
@@ -276,6 +285,52 @@ impl NodeServiceImpl {
276285
self.broadcast_spec(spec_event).await;
277286
}
278287

288+
// Volume created - send to specific node
289+
Event::VolumeCreated(volume) => {
290+
let spec_event = SpecEvent {
291+
revision,
292+
r#type: SpecEventType::Create as i32,
293+
spec: Some(spec_event::Spec::Volume(ProtoVolumeSpec {
294+
meta: Some(ResourceMeta {
295+
id: volume.id.clone(),
296+
name: volume.name.clone(),
297+
project_id: volume.project_id.clone(),
298+
node_id: Some(volume.node_id.clone()),
299+
..Default::default()
300+
}),
301+
size_gb: volume.size_bytes / (1024 * 1024 * 1024),
302+
template_id: volume.template_id.clone(),
303+
attached_vm_id: None,
304+
})),
305+
};
306+
307+
tracing::info!(
308+
"Sending volume spec {} to node {}",
309+
volume.id,
310+
volume.node_id
311+
);
312+
self.send_spec_to_node(&volume.node_id, spec_event).await?;
313+
}
314+
315+
// Volume deleted - send to specific node
316+
Event::VolumeDeleted { id, node_id } => {
317+
let spec_event = SpecEvent {
318+
revision,
319+
r#type: SpecEventType::Delete as i32,
320+
spec: Some(spec_event::Spec::Volume(ProtoVolumeSpec {
321+
meta: Some(ResourceMeta {
322+
id: id.clone(),
323+
node_id: Some(node_id.clone()),
324+
..Default::default()
325+
}),
326+
..Default::default()
327+
})),
328+
};
329+
330+
tracing::info!("Sending volume delete {} to node {}", id, node_id);
331+
self.send_spec_to_node(&node_id, spec_event).await?;
332+
}
333+
279334
// Other events don't require spec distribution
280335
_ => {}
281336
}

mvirt-api/src/main.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,11 @@ async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
164164
info!("Waiting for leader election...");
165165
if let Some(leader) = node.wait_for_leader(Duration::from_secs(10)).await {
166166
info!("Leader elected: node {}", leader);
167+
// Set is_leader flag so state machine emits events via broadcast channel
168+
if leader == node_id {
169+
let flag = node.is_leader_flag();
170+
flag.store(true, std::sync::atomic::Ordering::SeqCst);
171+
}
167172
} else {
168173
warn!("No leader elected within timeout");
169174
}

0 commit comments

Comments
 (0)