Skip to content

Commit 61091b0

Browse files
committed
refactor(contract)!: hypervisor-agnostic wire — GuestChannel, opaque ready payload, mechanism-only disk roles, no is_instance
Review feedback: the contract must stay generic regardless of the hypervisor; Firecracker-vs-QEMU is the only distinction the supervisor should see. - CreateVmRequest.program_mode (program/instance vocabulary on the wire) becomes `optional GuestChannel guest_channel { ready_port }`: a pure mechanism request — expose a host⇄guest channel and treat the guest's ready signal on ready_port as part of boot. Firecracker instances are naturally 'FC without a channel'. - VmInfo.control_socket_path → guest_channel_path (empty when the VM has no channel, which covers QEMU). VmInfo.runtime_version → guest_ready_payload: the raw bytes the guest sent, passed through opaquely — the supervisor no longer parses the Aleph runtime's msgpack handshake; the agent does (runtime_config_from_ready_payload). MicroVM keeps the raw payload and takes the ready port as a parameter. - DiskRole collapses to ROOTFS/EXTRA: which disk is the root device is mechanism; code/runtime/data are workload roles the client maps onto guest devices via disk order (which it already did). - VmInfo.is_instance removed (field 18 reserved): agent vocabulary, derived from the registry; the guest channel's presence is the registry-miss fallback for the /about list labels. - The Aleph runtime's channel conventions (control port 52, guest API 53) move to aleph.vm.utils.runtime_channel, agent-side, and replace the hardcoded CONNECT 52 strings. - packaging: deb package ships grpcio (matches pyproject). Re-validated live through the split: CreateVm over gRPC reports ready_payload=b'\x81\xa7version\xa52.0.0' (raw guest msgpack), agent parses it, config push + run_code('/') → HTTP 200, DeleteVm clean. Suite: 742 passed, same 9 environment-only failures as base.
1 parent 58c17e8 commit 61091b0

28 files changed

Lines changed: 455 additions & 352 deletions

docs/plans/2026-06-11-grpc-process-split-design.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -97,17 +97,31 @@ supervisor-side:
9797
| guest API process (`{vsock}_53`) | agent |
9898
| idle expiry / update-watch / teardown decision | agent (already migrated) |
9999

100-
Contract additions (regenerated bindings committed):
101-
102-
- `CreateVmRequest.program_mode: bool` — supervisor uses the program boot
103-
flow (vsock init handshake, no cloud-init); carries no Aleph vocabulary.
104-
- `VmInfo.control_socket_path: string` — host UDS path of the VM's vsock;
105-
empty for backends without one.
106-
- `VmInfo.runtime_version: string` — the version string the guest init
107-
reported during the handshake (the agent needs it to format the config
108-
push); empty until init signaled / for non-program VMs.
109-
- `VmInfo.ipv4_gateway` / `ipv6_gateway` — host-side tap addresses, needed
110-
by the agent to fill the guest network config it pushes.
100+
Contract additions (regenerated bindings committed; reworked 2026-06-11 to
101+
stay hypervisor- and workload-agnostic — review feedback):
102+
103+
- `CreateVmRequest.guest_channel: GuestChannel { ready_port }` — optional
104+
host⇄guest control channel (Firecracker vsock today; QEMU could implement
105+
it with virtio-vsock). When present, the supervisor exposes the channel
106+
and waits for the guest's ready signal on `ready_port` as part of boot.
107+
What is spoken over the channel is the client's business. (Replaced the
108+
earlier `program_mode: bool`, which leaked the program/instance
109+
distinction onto the wire.)
110+
- `VmInfo.guest_channel_path: string` — host UDS endpoint of the channel;
111+
empty when the VM has none (QEMU instances).
112+
- `VmInfo.guest_ready_payload: bytes` — the raw bytes the guest sent with
113+
its ready signal, passed through opaquely; the agent parses the Aleph
114+
runtime's msgpack version handshake out of it. (Replaced
115+
`runtime_version`, which required the supervisor to parse the payload.)
116+
- `VmInfo.ipv4_gateway` / `ipv6_gateway` — host-side tap addresses (bare,
117+
no prefix), the guest's default routes for the agent's config push.
118+
- `DiskRole` collapsed to ROOTFS/EXTRA: workload roles (code/runtime/data)
119+
are client vocabulary, mapped onto guest devices via disk order.
120+
- `VmInfo.is_instance` removed (field 18 reserved): the instance/program
121+
distinction is derived agent-side from the registry, with the guest
122+
channel's presence as the registry-miss fallback for labeling.
123+
- The Aleph runtime's channel conventions (control port 52, guest API port
124+
53) live in `aleph.vm.utils.runtime_channel`, agent-side.
111125

112126
Supervisor side: `pool.create_vm_from_spec` accepts
113127
`backend=FIRECRACKER, program_mode=True` specs and builds a message-free

packaging/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ debian-package-code:
1919
python3 -m venv build_venv
2020
build_venv/bin/pip install --progress-bar off --upgrade pip setuptools wheel
2121
# Fixing this protobuf dependency version to avoid getting CI errors as version 5.29.0 have this compilation issue
22-
build_venv/bin/pip install --no-cache-dir --progress-bar off --target ./aleph-vm/opt/aleph-vm/ 'aleph-message~=1.0.1' 'eth-account==0.10' 'sentry-sdk==1.31.0' 'qmp==1.1.0' 'aleph-superfluid~=0.2.1' 'sqlalchemy[asyncio]>=2.0' 'aiosqlite==0.19.0' 'alembic==1.13.1' 'aiohttp_cors==0.7.0' 'pydantic-settings==2.6.1' 'pyroute2==0.7.12' 'python-cpuid==0.1.0' 'solathon==1.0.2' 'protobuf==5.29.6' 'redis>=4.2.0' 'legacy-cgi>=1.0'
22+
build_venv/bin/pip install --no-cache-dir --progress-bar off --target ./aleph-vm/opt/aleph-vm/ 'aleph-message~=1.0.1' 'eth-account==0.10' 'sentry-sdk==1.31.0' 'qmp==1.1.0' 'aleph-superfluid~=0.2.1' 'sqlalchemy[asyncio]>=2.0' 'aiosqlite==0.19.0' 'alembic==1.13.1' 'aiohttp_cors==0.7.0' 'pydantic-settings==2.6.1' 'pyroute2==0.7.12' 'python-cpuid==0.1.0' 'solathon==1.0.2' 'protobuf==5.29.6' 'grpcio>=1.70,<1.71' 'redis>=4.2.0' 'legacy-cgi>=1.0'
2323
build_venv/bin/python3 -m compileall ./aleph-vm/opt/aleph-vm/
2424

2525
debian-package-resources: firecracker-bins vmlinux target/bin/sevctl

proto/supervisor.proto

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -141,11 +141,17 @@ message CreateVmRequest {
141141
optional uint32 numa_node = 11; // requested placement (0-indexed). Unset = auto. See VmInfo.numa_node for the effective placement.
142142
bool persistent = 12; // supervisor wraps in systemd if true
143143
repeated string ssh_authorized_keys = 13; // guest cloud-init SSH keys (agent-provided)
144-
// Program boot flow: the supervisor enables the VMM control socket (vsock),
145-
// waits for the guest init's ready signal as part of boot, and reports the
146-
// socket in VmInfo.control_socket_path. It carries no Aleph vocabulary: the
147-
// payloads the agent exchanges over that socket are opaque to the supervisor.
148-
bool program_mode = 14;
144+
// Optional host⇄guest control channel (Firecracker vsock today; a QEMU
145+
// backend may implement it with virtio-vsock). When present, the supervisor
146+
// exposes the channel and waits for the guest's ready signal on
147+
// `ready_port` as part of boot: the VM reports RUNNING only after the
148+
// signal. What is spoken over the channel is the client's business, opaque
149+
// to the supervisor.
150+
optional GuestChannel guest_channel = 14;
151+
}
152+
153+
message GuestChannel {
154+
uint32 ready_port = 1; // the guest connects here to signal readiness
149155
}
150156

151157
message DiskConfig {
@@ -162,13 +168,14 @@ message DiskConfig {
162168
FORMAT_SQUASHFS = 3;
163169
}
164170

171+
// Mechanism-only roles: the supervisor needs to know which disk is the
172+
// root device; everything else is attached in spec order (guest device
173+
// names are deterministic from that order, which is how the client maps
174+
// its workload semantics — code, data, caches — onto devices).
165175
enum DiskRole {
166176
DISK_ROLE_UNSPECIFIED = 0;
167177
DISK_ROLE_ROOTFS = 1;
168-
DISK_ROLE_CODE = 2;
169-
DISK_ROLE_RUNTIME = 3;
170-
DISK_ROLE_DATA = 4;
171-
DISK_ROLE_EXTRA = 5;
178+
DISK_ROLE_EXTRA = 2;
172179
}
173180
}
174181

@@ -223,23 +230,23 @@ message VmInfo {
223230
uint64 stopping_at_ns = 16;
224231
uint64 stopped_at_ns = 17;
225232

226-
// True for instances (full VMs), false for programs/microvms. Independent of
227-
// the hypervisor backend: an instance may run under Firecracker or QEMU, so
228-
// `backend` alone cannot recover this. Mirrors VmExecution.is_instance.
229-
bool is_instance = 18;
233+
// Was `is_instance`: the instance/program distinction is client vocabulary,
234+
// derived client-side from its own records (or from the guest channel's
235+
// presence as a last resort).
236+
reserved 18;
230237

231238
ConfidentialMode confidential_mode = 19; // precise TEE mode; NONE for non-confidential VMs
232239
repeated GpuDevice gpus = 20; // exact PCI devices attached to this VM (mirrors HostInfo.gpus)
233240

234-
// Host UDS path of the VM's control socket (Firecracker vsock). The agent
235-
// dials it for guest-level protocols (program config push, code execution)
236-
// and binds `<path>_<port>` listeners for guest-initiated connections.
237-
// Empty for backends without one (program_mode VMs only today).
238-
string control_socket_path = 21;
239-
// Version string the guest init reported during the boot handshake; empty
240-
// until the init signaled (or for VMs without the handshake). The agent
241-
// formats its guest payloads according to this version.
242-
string runtime_version = 22;
241+
// Host UDS endpoint of the guest control channel (see
242+
// CreateVmRequest.guest_channel). The client dials it for guest-level
243+
// protocols and binds `<path>_<port>` listeners for guest-initiated
244+
// connections. Empty when the VM was created without a channel.
245+
string guest_channel_path = 21;
246+
// Raw bytes the guest sent with its ready signal, passed through opaquely;
247+
// empty until the signal arrived (or for VMs without a channel). The client
248+
// interprets them (the Aleph runtime sends its version handshake here).
249+
bytes guest_ready_payload = 22;
243250
// Host-side tap addresses (no prefix). The agent passes them to the guest
244251
// as default routes in its network configuration push.
245252
string ipv4_gateway = 23;

src/aleph/vm/controllers/firecracker/spec_program.py

Lines changed: 24 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
1-
"""Message-free Firecracker program controller, driven by a CreateVmSpec.
2-
3-
The supervisor side of the program (microvm) split: boots a Firecracker VM
4-
from resolved on-disk paths only — no Aleph message, no download, no guest
5-
configuration. The Aleph-runtime protocols (config push, code execution,
6-
guest API) are the agent's business, spoken over the vsock control socket
7-
this VM exposes (reported via VmInfo.control_socket_path).
8-
9-
Drive order is part of the contract with the agent: root device (the
10-
RUNTIME-role disk), then the CODE-role disk if present, then EXTRA disks in
11-
spec order. The agent derives guest device names (vdb, vdc, …) from that
12-
order when it builds its configuration push.
1+
"""Message-free Firecracker controller for guest-channel VMs, driven by a
2+
CreateVmSpec.
3+
4+
Boots a Firecracker VM from resolved on-disk paths only — no Aleph message,
5+
no download, no guest configuration. The guest-level protocols (the Aleph
6+
config push, code execution, guest API) are the client's business, spoken
7+
over the vsock channel this VM exposes (reported via
8+
VmInfo.guest_channel_path); the guest's ready signal on the channel is part
9+
of boot.
10+
11+
Drive order is part of the contract with the client: the ROOTFS-role disk is
12+
the root device, then the EXTRA disks in spec order. The client derives guest
13+
device names (vdb, vdc, …) from that order.
1314
"""
1415

1516
from __future__ import annotations
@@ -41,32 +42,23 @@
4142

4243
@dataclass
4344
class SpecProgramResources:
44-
"""Resolved paths for a spec-driven program boot. No download happens
45-
here: the agent prepared every file and the spec carries the paths."""
45+
"""Resolved paths for a spec-driven boot. No download happens here: the
46+
client prepared every file and the spec carries the paths."""
4647

4748
kernel_image_path: Path
48-
rootfs_path: Path # the RUNTIME-role disk: a program's root filesystem
49-
code_disk: DiskSpec | None
49+
rootfs_path: Path
5050
extra_disks: list[DiskSpec] = field(default_factory=list)
5151

5252
@classmethod
5353
def from_spec(cls, spec: CreateVmSpec) -> SpecProgramResources:
5454
kernel_path = spec.kernel_path
5555
if not str(kernel_path) or str(kernel_path) == ".":
56-
raise InvalidBackendError("A program spec requires a kernel_path")
57-
58-
runtime_disks = [disk for disk in spec.disks if disk.role is DiskRole.RUNTIME]
59-
if len(runtime_disks) != 1:
60-
raise InvalidBackendError(f"A program spec requires exactly one RUNTIME disk, got {len(runtime_disks)}")
61-
62-
code_disks = [disk for disk in spec.disks if disk.role is DiskRole.CODE]
63-
if len(code_disks) > 1:
64-
raise InvalidBackendError(f"A program spec carries at most one CODE disk, got {len(code_disks)}")
56+
raise InvalidBackendError("A Firecracker spec requires a kernel_path")
6557

58+
rootfs = spec.require_rootfs()
6659
return cls(
6760
kernel_image_path=kernel_path,
68-
rootfs_path=runtime_disks[0].path,
69-
code_disk=code_disks[0] if code_disks else None,
61+
rootfs_path=rootfs.path,
7062
extra_disks=[disk for disk in spec.disks if disk.role is DiskRole.EXTRA],
7163
)
7264

@@ -75,7 +67,7 @@ def to_dict(self):
7567

7668

7769
class SpecFirecrackerProgram(AlephFirecrackerExecutable[None]):
78-
"""Spec-driven program microvm: VMM boot + init handshake only."""
70+
"""Spec-driven guest-channel microvm: VMM boot + ready handshake only."""
7971

8072
resources: SpecProgramResources # type: ignore[assignment]
8173
is_instance = False
@@ -106,7 +98,7 @@ async def setup(self) -> None:
10698
logger.debug("Setup started for spec program VM=%s", self.vm_id)
10799
await setfacl()
108100

109-
extra_disks = ([self.resources.code_disk] if self.resources.code_disk else []) + self.resources.extra_disks
101+
extra_disks = self.resources.extra_disks
110102
self._firecracker_config = FirecrackerConfig(
111103
boot_source=BootSource(
112104
kernel_image_path=Path(self.fvm.enable_kernel(self.resources.kernel_image_path)),
@@ -134,8 +126,9 @@ async def setup(self) -> None:
134126
)
135127

136128
async def wait_for_init(self) -> None:
137-
"""The init-ready handshake is part of boot for program-mode VMs."""
138-
await self.fvm.wait_for_init()
129+
"""The guest's ready handshake is part of boot for channel VMs."""
130+
ready_port = self.spec.guest_channel.ready_port if self.spec.guest_channel else 52
131+
await self.fvm.wait_for_init(ready_port=ready_port)
139132

140133
async def start_guest_api(self):
141134
"""Agent-owned across the boundary: the agent binds `<vsock>_53` itself."""

src/aleph/vm/controllers/qemu/instance.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ def from_spec(cls, spec: "CreateVmSpec", namespace: str) -> "AlephQemuResources"
151151
resources.volumes = [
152152
HostVolume(mount=d.mount, path_on_host=d.path, read_only=d.readonly, size_mib=None)
153153
for d in spec.disks
154-
if d.role in {DiskRole.EXTRA, DiskRole.DATA}
154+
if d.role is DiskRole.EXTRA
155155
]
156156
resources.gpus = [HostGPU(pci_host=g.pci_host, supports_x_vga=g.supports_x_vga) for g in spec.gpus]
157157
return resources

src/aleph/vm/hypervisors/firecracker/microvm.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ class MicroVM:
8787
drives: list[Drive]
8888
init_timeout: float
8989
runtime_config: RuntimeConfiguration | None
90+
# Raw bytes from the guest's ready signal; b"" until it arrives.
91+
init_payload: bytes = b""
9092
mounted_rootfs: Path | None = None
9193
_unix_socket: Server | None = None
9294
enable_log: bool
@@ -142,6 +144,7 @@ def __init__(
142144
self.drives = []
143145
self.init_timeout = init_timeout
144146
self.runtime_config = None
147+
self.init_payload = b""
145148
self.enable_log = enable_log
146149

147150
def to_dict(self) -> dict:
@@ -393,13 +396,20 @@ def enable_drive(self, drive_path: Path, read_only: bool = True) -> Drive:
393396
self.drives.append(drive)
394397
return drive
395398

396-
async def wait_for_init(self) -> None:
397-
"""Wait for a connection from the init in the VM"""
399+
async def wait_for_init(self, ready_port: int = 52) -> None:
400+
"""Wait for a connection from the init in the VM.
401+
402+
The raw bytes the guest sends with its ready signal are kept in
403+
``self.init_payload`` for pass-through consumers (the supervisor
404+
contract reports them opaquely); the legacy RuntimeConfiguration parse
405+
is preserved for the in-process program path.
406+
"""
398407
logger.debug("Waiting for init...")
399408
queue: asyncio.Queue[RuntimeConfiguration] = asyncio.Queue()
400409

401410
async def unix_client_connected(reader: asyncio.StreamReader, _writer: asyncio.StreamWriter):
402411
data = await reader.read(1_000_000)
412+
self.init_payload = data or b""
403413
if data:
404414
config_dict: dict[str, Any] = msgpack.loads(data)
405415
runtime_config = RuntimeConfiguration(version=config_dict["version"])
@@ -410,9 +420,11 @@ async def unix_client_connected(reader: asyncio.StreamReader, _writer: asyncio.S
410420
logger.debug("Runtime version: %s", runtime_config)
411421
await queue.put(runtime_config)
412422

413-
self._unix_socket = await asyncio.start_unix_server(unix_client_connected, path=f"{self.vsock_path}_52")
423+
self._unix_socket = await asyncio.start_unix_server(
424+
unix_client_connected, path=f"{self.vsock_path}_{ready_port}"
425+
)
414426
if self.use_jailer:
415-
system(f"chown jailman:jailman {self.vsock_path}_52")
427+
system(f"chown jailman:jailman {self.vsock_path}_{ready_port}")
416428
try:
417429
self.runtime_config = await asyncio.wait_for(queue.get(), timeout=self.init_timeout)
418430
logger.debug("...signal from init received")

src/aleph/vm/orchestrator/views/__init__.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -202,16 +202,18 @@ async def debug_haproxy(request: web.Request) -> web.Response:
202202

203203

204204
def _vm_type_name(record: AgentVmRecord | None, info: VmInfo) -> str:
205-
"""vm_type label: from the agent's message when known, otherwise from the
206-
supervisor's instance flag (spec-created / reattached VMs without a registry
207-
record) — the same VmExecution.is_instance the old views fell back to.
208-
209-
The instance flag is used rather than the backend because the backend alone
210-
cannot recover instance-ness: an instance running under Firecracker reports
211-
Backend.FIRECRACKER yet is still an instance."""
205+
"""vm_type label: from the agent's message when known; otherwise a
206+
best-effort guess from the guest channel (registry-miss fallback for
207+
spec-created / reattached VMs).
208+
209+
The instance/program distinction is agent vocabulary the wire no longer
210+
carries. Every VM the agent runs as a microvm has a guest channel and
211+
instances have none, so the channel's presence recovers the label for VMs
212+
we lost the record of. (Backend alone cannot: an instance running under
213+
Firecracker reports Backend.FIRECRACKER yet is still an instance.)"""
212214
if record is not None:
213215
return VmType.from_message_content(record.message).name
214-
return VmType.instance.name if info.is_instance else VmType.microvm.name
216+
return VmType.microvm.name if info.guest_channel_path else VmType.instance.name
215217

216218

217219
def _datetime_from_ns(ns: int) -> datetime | None:

0 commit comments

Comments
 (0)