Message-free supervisor: process split — agent ⇄ supervisor over gRPC, programs included by odesenfans · Pull Request #977 · aleph-im/aleph-vm

odesenfans · 2026-06-11T08:44:17Z

Experiment: drive the supervisor/agent split up to the gRPC boundary in one pass. Rebased on dev after the #976 squash-merge — the diff is exactly the 10 experiment commits.

What & why

Phase 1 of the backport design, accelerated: the agent and the supervisor run as two separate processes, with every live VM operation — on-demand Firecracker programs included — crossing proto/supervisor.proto over a Unix-socket gRPC link. The in-process mode remains the default; split mode is the strangler toggle.

Changes

gRPC transport (supervisor/proto_convert.py, grpc_server.py, grpc_client.py): SupervisorService wraps any Supervisor behind the proto; GrpcSupervisor implements the ABC as a client. Errors cross as a serialized ErrorDetail in the aleph-supervisor-error-bin trailer + mapped status codes; the client rebuilds the exact SupervisorError class (UNIMPLEMENTED → NotImplementedSupervisorError). ReinstallVmRequest.wipe_volumes becomes optional bool so unset takes the ABC's True default.
Daemon (python -m aleph.vm.supervisor): owns the VmPool (controllers, networking, systemd, reattach), serves on ALEPH_VM_SUPERVISOR_GRPC_SOCKET. SIGTERM stops the server only — VMs keep running and reattach on the next start (Message-free supervisor: reboot-recovery from on-disk configs #957 path).
Contract additions for programs: CreateVmRequest.program_mode plus VmInfo.control_socket_path / runtime_version / ipv4_gateway / ipv6_gateway. Infra-only vocabulary: the payloads spoken over the control socket stay opaque to the supervisor.
Program path crosses the seam (the "hardest entanglement"):
- agent: build_program_create_vm_spec downloads code/runtime/data/volumes (AlephProgramResources) and emits a path-only spec (RUNTIME/CODE/EXTRA disk roles; drive order is the contract guest device names derive from);
- supervisor: SpecFirecrackerProgram boots from resolved paths, vsock on, init handshake awaited as part of boot;
- agent: ProgramGuestClient owns the guest API process (<vsock>_53), the runtime config push and run_code (CONNECT 52); run_code_on_request/on_event rewritten off pool/VmExecution, serialized per VM (one create, one config push under concurrency — race-tested).
Split mode: ALEPH_VM_SUPERVISOR_GRPC_SOCKET set ⇒ the agent builds no pool and wires GrpcSupervisor; endpoints whose capability has not crossed yet return 501 via require_vm_pool (backups, restore, confidential, migration, network recreate, HAProxy regenerate, GPU reservation, persistent programs). Agent shutdown no longer stops VMs in this mode.
Hardening found by the live run: bounded guest-API startup wait (the inherited loop hung forever on child death), explicit fork context (Python 3.14 changed the Linux default), server-side traceback logging for ERROR_CODE_INTERNAL aborts.

Out of scope (501 / skipped in split mode, consistent with the series)

Persistent programs, backups/restore, confidential, migration, domains/HAProxy, GPU reservation, the advisory admission gate. Known nuance: no in-flight-run tracking across the boundary yet — a reap during a long run_code kills that request (mitigated by the expiry cancel/schedule bracketing). See the design doc §6.

Test plan

Live e2e through the split (local, no jailer): diagnostic-program message loaded from the Aleph network → agent-side downloads → CreateVm over UDS gRPC boots the microvm in the daemon (init handshake, runtime 2.0.0) → agent config push + run_code('/') over vsock → HTTP 200 → DeleteVm clean.
GrpcSupervisor conformance + class-exact error round-trips + streaming over a real in-test UDS channel; DTO⇄proto round-trips; spec-program boot config; _ensure_program_vm behaviors incl. the concurrency race; split-mode wiring.
Full tests/ suite: 741 passed, 9 pre-existing environment-only failures (identical set at the base).
mypy error set identical to base; ruff format, isort, proto-clean gates green.

Protocol review follow-through (second pass)

A full review of the contract, implemented as a commit series (each commit one concern):

Vocabulary scrubbed off the wire

DiskConfig.mount dropped (dead on both backends — Firecracker mounts travel over the guest channel, QemuVM never read the field).
BACKEND_QEMU_SEV dropped: the VMM and the TEE are orthogonal; confidential = backend: QEMU + TeeConfig presence. The spec path now rejects TEE specs loudly instead of booting unprotected.
Enum-typed wire fields (HealthStatus, Protocol, TeeBackend); is_instance field name reserved too.
VmInfo IPs folded into IpAssignment {address, network_cidr, gateway} × 2 families.
hostname moved into CreateVmRequest (base32-of-item-hash naming is computed agent-side now).
GuestChannel.ready_timeout_secs: boot-time policy crosses the wire instead of living in supervisor settings.
Same-host invariant (paths by reference over a shared fs) stated in the proto header.

Capabilities

GetVmSpec; idempotent CreateVm on vm_id (same spec → current info, conflict → ALREADY_EXISTS); honest RebootVm (ephemeral spec VMs are stopped and re-created from the held spec).
StopVm/StartVm — stop without releasing the definition (persistent VMs; ephemeral answer UNIMPLEMENTED, their cycle is DeleteVm+CreateVm).
WatchEvents server stream + a split-mode agent watcher that drops per-VM agent state when a VM leaves RUNNING (the cross-process replacement for the in-process reap hooks), reconnecting on failure.
Deadlines on every unary client RPC (30s query / 300s lifecycle — gRPC's default is infinite); live log chunks stamped at capture (no more 1970-epoch lines).

Verified (second pass): full suite 769 passed + 3 xfailed, same 9 env-only failures; mypy error set identical to base; live e2e re-ran clean on the final contract (ready_timeout_secs=20 visible on the wire, HTTP 200, clean delete). Deferred items (resource accounting, google.rpc.Status envelope, capabilities RPC, migration reshape, opaque cloud-init user-data) are recorded in the design doc §7.

Design & status: docs/plans/2026-06-11-grpc-process-split-design.md

🤖 Generated with Claude Code

…ev-accelerate)

…ersion The wire layer for Phase 1: SupervisorService wraps any Supervisor behind proto/supervisor.proto over a UDS; GrpcSupervisor implements the Supervisor ABC as a gRPC client. SupervisorError crosses as a serialized ErrorDetail in the aleph-supervisor-error-bin trailer plus a mapped status code, and the client rebuilds the exact exception class (UNIMPLEMENTED → NotImplemented). ReinstallVmRequest.wipe_volumes becomes `optional bool` so an unset field takes the ABC's True default instead of proto3's false. Tested: DTO/proto round-trips, conformance + behavior over a real UDS channel, class-exact error round-trips, streaming, reinstall default.

…cket python -m aleph.vm.supervisor owns the VmPool (controllers, networking, systemd, reattach) and serves the gRPC contract on ALEPH_VM_SUPERVISOR_GRPC_SOCKET (default EXECUTION_ROOT/supervisor.sock). SIGTERM stops the server only — VMs keep running and are reattached on the next start, the #957 recovery path. Reattach failure is non-fatal so the contract stays reachable. New setting SUPERVISOR_GRPC_SOCKET: when set, the agent will talk to this socket instead of the in-process supervisor (wired in a follow-up).

…ateways on VmInfo CreateVmRequest.program_mode asks the supervisor for the program boot flow (vsock enabled, guest-init ready signal awaited as part of boot). VmInfo gains the infra facts the agent needs to own the guest-level protocols across the process split: control_socket_path (the vsock UDS the agent dials / binds listeners beside), runtime_version (the init handshake's version string), and the tap gateway addresses for the network config the agent pushes. No Aleph vocabulary crosses: payloads on the control socket stay opaque to the supervisor.

…VmSpec SpecFirecrackerProgram boots a program microvm from resolved paths only: kernel + RUNTIME-role root disk + optional CODE disk + EXTRA disks, vsock enabled, guest-init handshake awaited as part of boot. The guest API and config push are explicitly NOT here — they are agent protocols spoken over the control socket. Drive order (rootfs, code, extras) is the contract the agent derives guest device names from. pool.create_vm_from_spec routes backend=FIRECRACKER + program_mode specs to the new boot path (ephemeral only; persistent programs stay legacy). Admission for spec-built VMs remains the agent's responsibility, as documented on check_admission.

… boundary The 'hardest entanglement' crosses the seam. run_code_on_request / run_code_on_event no longer touch pool/VmExecution for ephemeral programs: - build_program_create_vm_spec downloads code/runtime/data/volumes agent-side (AlephProgramResources) and emits a path-only spec (backend=FIRECRACKER, program_mode, RUNTIME/CODE/EXTRA disk roles). - ProgramGuestClient owns the guest protocols the controller used to run: the guest API process on <vsock>_53, the runtime configuration push (CONNECT 52 — code bytes, entrypoint, variables, network from VmInfo's gateways/runtime_version), and run_code. Guest device names derive from the spec's disk order. - _ensure_program_vm is the get-or-create: a running VM this agent process did not configure is recreated (the runtime takes one config push per boot, so unknown == unusable). Failures map to the same HTTP responses as before via both error vocabularies (download-phase controller exceptions + SupervisorError). - Idle expiry / update-watch reapers also drop the agent's guest state (guest API process, configured mark) via the on_reaped hooks. Persistent programs keep the legacy pool path (spec-path support deferred); instances were already migrated. Suite: 705 passed, 9 pre-existing environment-only failures (identical at base).

When ALEPH_VM_SUPERVISOR_GRPC_SOCKET is set, the agent constructs no VmPool: app['supervisor'] is a GrpcSupervisor on that socket and the daemon (python -m aleph.vm.supervisor) owns the VMs. Agent shutdown no longer stops VMs in this mode — that is the point of separating the lifecycles. All migrated surface works across the boundary: allocations, instance lifecycle, operator stop/erase/reboot/reinstall/logs, port forwards, the about/executions views, and the on-demand program path. Endpoints whose capability has not crossed yet degrade to 501 via require_vm_pool (backups, restore, confidential, migration, network recreate, HAProxy regenerate, GPU reservation, persistent programs) instead of crashing on a missing pool; about-pages fall back to pool-less reporting (no GPUs, raw free disk). Smoke-tested: supervisor daemon in one process, agent webapp with pool=None in another, /v2/about/executions/list served through the real UDS channel.

Two simultaneous requests to a not-yet-created program raced to create-and-configure it; the runtime accepts one configuration push per boot, so the loser would wedge the VM. _ensure_program_vm now runs under a per-VM creation lock owned by the ProgramGuestClient; followers re-check inside the lock and take the fast path. Pinned by a gather()-based race test. Also records the first-pass status and known gaps in the design doc (in-flight run vs reap across the boundary, deferred capabilities).

… server-side Found by a live end-to-end run (real diagnostic program in a Firecracker microvm, agent and supervisor as separate processes over the UDS contract): - The guest API wait loop (inherited from the old controller) spun forever when the child process died at startup. Bounded to 10s, with the child's death surfaced as VmSetupError. The child is now forked explicitly: Python 3.14 changed the Linux default away from fork, and the guest API relies on inheriting the parent's loaded settings. - A stale guest-API socket from a previous run is unlinked before binding. - The gRPC server now logs the chained traceback when it aborts an RPC with ERROR_CODE_INTERNAL — those are 'not a vocabulary case' bugs and were previously invisible on the daemon side. The live run (Aleph network message -> agent-side downloads -> CreateVm over gRPC -> init handshake -> agent config push + run_code over vsock -> HTTP 200 -> DeleteVm) completes cleanly.

…ready payload, mechanism-only disk roles, no is_instance Review feedback: the contract must stay generic regardless of the hypervisor; Firecracker-vs-QEMU is the only distinction the supervisor should see. - CreateVmRequest.program_mode (program/instance vocabulary on the wire) becomes `optional GuestChannel guest_channel { ready_port }`: a pure mechanism request — expose a host⇄guest channel and treat the guest's ready signal on ready_port as part of boot. Firecracker instances are naturally 'FC without a channel'. - VmInfo.control_socket_path → guest_channel_path (empty when the VM has no channel, which covers QEMU). VmInfo.runtime_version → guest_ready_payload: the raw bytes the guest sent, passed through opaquely — the supervisor no longer parses the Aleph runtime's msgpack handshake; the agent does (runtime_config_from_ready_payload). MicroVM keeps the raw payload and takes the ready port as a parameter. - DiskRole collapses to ROOTFS/EXTRA: which disk is the root device is mechanism; code/runtime/data are workload roles the client maps onto guest devices via disk order (which it already did). - VmInfo.is_instance removed (field 18 reserved): agent vocabulary, derived from the registry; the guest channel's presence is the registry-miss fallback for the /about list labels. - The Aleph runtime's channel conventions (control port 52, guest API 53) move to aleph.vm.utils.runtime_channel, agent-side, and replace the hardcoded CONNECT 52 strings. - packaging: deb package ships grpcio (matches pyproject). Re-validated live through the split: CreateVm over gRPC reports ready_payload=b'\x81\xa7version\xa52.0.0' (raw guest msgpack), agent parses it, config push + run_code('/') → HTTP 200, DeleteVm clean. Suite: 742 passed, same 9 environment-only failures as base.

codecov · 2026-06-11T10:09:41Z

Codecov Report

❌ Patch coverage is 73.34025% with 1028 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.01%. Comparing base (70cc3ea) to head (be9425f).

Files with missing lines	Patch %	Lines
tests/integration/conftest.py	24.23%	244 Missing and 3 partials ⚠️
tests/integration/test_vm_management.py	16.66%	95 Missing ⚠️
src/aleph/vm/orchestrator/run.py	53.37%	56 Missing and 20 partials ⚠️
src/aleph/vm/orchestrator/vm/program_client.py	48.22%	72 Missing and 1 partial ⚠️
tests/integration/test_resource_release.py	20.89%	53 Missing ⚠️
src/aleph/vm/supervisor/daemon.py	0.00%	51 Missing ⚠️
tests/integration/test_backup_restore.py	18.64%	48 Missing ⚠️
tests/integration/test_vm_creation.py	21.66%	47 Missing ⚠️
src/aleph/vm/pool.py	28.30%	37 Missing and 1 partial ⚠️
src/aleph/vm/supervisor/inprocess.py	89.45%	23 Missing and 14 partials ⚠️
... and 25 more

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #977      +/-   ##
==========================================
- Coverage   78.92%   78.01%   -0.91%     
==========================================
  Files         161      188      +27     
  Lines       18469    22107    +3638     
  Branches     1328     1562     +234     
==========================================
+ Hits        14576    17246    +2670     
- Misses       3517     4427     +910     
- Partials      376      434      +58

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Drop DiskConfig.mount: guest mount points are client vocabulary no hypervisor mechanism consumes (Firecracker mounts travel over the guest channel; QemuVM never read the controller-config field). - Drop BACKEND_QEMU_SEV: the VMM and the TEE are orthogonal. A confidential VM is backend QEMU + a TeeConfig; the spec path now rejects TEE specs explicitly (TeeUnavailableError) instead of encoding them as a backend variant it would silently ignore. - Enum-ify the stringly-typed fields: HealthStatus, Protocol (port forwarding), TeeBackend (TeeConfig + Measurement). - Reserve the "is_instance" field name alongside its number. - State the same-host invariant (paths by reference over a shared filesystem) in the proto header.

…ages The six flat strings (ipv4/ipv6 address, *_network, *_gateway) with mixed bare-IP/CIDR shapes become two IpAssignment messages, one per address family: {address (bare IP), network_cidr, gateway (bare IP)}. Old field numbers and names are reserved.

…equest The supervisor was deriving the hostname from the vm_id with Aleph's base32-of-item-hash convention. The agent now computes it and sends it; the supervisor applies it verbatim (empty falls back to a mechanical vm_id truncation).

… the client's How long a guest legitimately takes to signal ready is a fact about the image the client chose; it was bound by the supervisor-side settings.INIT_TIMEOUT. The channel now carries the bound (0 = supervisor default); the agent sends its runtime policy.

Live StreamLogs lines carried timestamp_ns=0 as a sentinel, which clients rendered as the 1970 epoch. The contract says timestamp_ns is server-side capture time; now it is.

- GetVmSpec(vm_id) returns the spec a live VM was created from, letting the agent rebuild its records after a restart (UNIMPLEMENTED for message-built legacy executions). - CreateVm is idempotent on vm_id: an identical spec for a live VM returns its current info (safe retry after UNAVAILABLE or a missed deadline); a different spec — or a collision with a message-built execution — fails ALREADY_EXISTS instead of silently returning the wrong VM. - RebootVm actually reboots ephemeral spec-created VMs: stop, then re-create from the held spec, instead of returning a stopped husk the client had to know to re-create.

DeleteVm was the only way down: there was no way to stop a persistent VM without erasing it, or to bring it back. StopVm stops the VM but keeps it listed (STOPPED, forget-on-stop defused the same way reinstall does); StartVm restarts it through the existing restart_persistent_vm path (tap/nftables/port-mapping re-application) and is a no-op read on a running VM. Ephemeral VMs answer UNIMPLEMENTED — their cycle is DeleteVm + CreateVm.

In split mode the agent had no way to learn a VM went down without polling: the in-process reap hooks only fire for agent-initiated teardowns. WatchEvents streams every transition the supervisor performs (create / stop / start / reboot's down-then-up / reinstall / delete), no replay (snapshot with ListVms, then watch). Agent side: a split-mode background task consumes the stream and drops per-VM agent state (guest API process, configured mark, idle-teardown timers) when a VM leaves RUNNING — the cross-process replacement for the reap hooks. Reconnects on stream failure; degrades gracefully if the supervisor answers UNIMPLEMENTED. Spontaneous guest-death detection (a crashed VMM with no supervisor operation) is not detected anywhere today; noted in the proto as a future extension.

gRPC's default deadline is infinite: a wedged supervisor daemon would hang the agent's HTTP handlers forever on a UDS that never answers. Every unary call now carries one — 30s for queries and quick host-side mutations, 300s for operations that boot or drain VMs — via a shared _unary helper that also centralizes the error-trailer translation. Streams (logs, events, backup download) are long-lived by design and carry none.

mypy parity with the base branch restored (the stubs reject valid async (Application) -> None handlers when app is precisely typed).

foxpatch-aleph

Well-designed PR implementing the gRPC-based process split between agent and supervisor. Clean ABC decomposition, robust closed-error vocabulary serialized in gRPC trailers, proper deadline discipline, and good backward compatibility via the Strangler Fig pattern. Minor comments on layering and a pre-existing timeout-handling gap.

src/aleph/vm/supervisor/grpc_client.py (line 49):

src/aleph/vm/orchestrator/run.py (line 662):

CI type-checks tests/ too; the mixed-value dict inferred as dict[str, object] rejected attribute access on the mocks.

foxpatch-aleph

Well-designed PR implementing the agent/supervisor process split via gRPC. The boundary contract is cleanly defined, error translation is thorough, program path migration correctly splits concerns between agent and supervisor, and test coverage is excellent including round-trip DTO conversions, class-exact error round-trip parametrization over a real UDS channel, race-condition tests for concurrent cold starts, and split-mode wiring. A few minor observations about process lifecycle timing and silent returns in deferred paths, but none blocking.

src/aleph/vm/orchestrator/vm/program_client.py (line 257): asyncio.sleep(0) after process.terminate() yields to the event loop but doesn't wait for the signal to be delivered. Consider removing the sleep(0) entirely since it's misleading here — the subsequent sleep(2) is the actual wait. Not a bug, just misleading.

src/aleph/vm/supervisor/grpc_client.py (line 115): The if metadata: check works because trailing_metadata() returns an empty tuple () when absent. A brief comment noting this assumption would help future readers.

src/aleph/vm/orchestrator/tasks.py (line 252): _handle_domains_aggregate silently returns when pool is None (split mode). Consider adding a debug log so it's clear why domain updates aren't happening.

src/aleph/vm/orchestrator/run.py (line 704): The legacy event path _run_code_on_event_legacy no longer calls expiry.cancel(vm_id) before serving, while the original run_code_on_event did. This is a minor behavioral regression for persistent programs under the legacy path — expiry isn't reset before serving.

src/aleph/vm/supervisor/types.py (line 112): Backend.QEMU_SEV enum value is removed. Since Backend ceased being a StrEnum with this PR, code comparing backend == "qemu_sev" would silently fail. Verified the test suite passes, so no such code survives — but worth a mention in the PR description or commit message as a breaking change.

Wire the six BackupOps methods to the existing qemu backup machinery (controllers/qemu/backup.py); the gRPC plumbing for them already existed. Backups cover the rootfs disk only, matching what restore can put back. - start_backup: async job per VM, idempotent against a running job and against a non-expired archive (24h TTL, as the operator endpoint did); optional guest fs-freeze via quiesce_guest, best-effort. - Supervisor-issued backup ids use microsecond timestamps so a retry after a failed run gets a fresh id; create_backup_archive accepts the timestamp so the tar stem matches the id issued at start. - get_backup_status/list_backups: completed archives are read from disk (the source of truth); only in-flight and failed runs live in memory. - download_backup: 1 MiB chunks with offsets. - restore_backup: extract the rootfs member (member-streamed, no extractall), verify, stop the VM with the forget-on-stop defused, swap the rootfs atomically, restart; emits down-then-up events. Persistent QEMU VMs only.

The reap task checked its stop_event identity but then deleted by hash, so when reboot_vm (or a delete+create sequence) recreated the VM under the same vm_id while the old stop was still being reaped, the task removed the NEW execution from the pool: the freshly rebooted VM vanished from GetVm/ListVms while its hypervisor kept running. Found by the supervisor integration suite (reboot test); forget by identity instead.

An opt-in (AVM_ITEST=1) suite that drives a real supervisor daemon over its Unix-socket gRPC contract, agent-free: specs are built inline from local artifacts, never from Aleph messages. - Harness: session-scoped daemon with private CACHE_ROOT/EXECUTION_ROOT; per-test gRPC clients; self-gating skips (KVM, root, kernel/runtime/cloud image paths via AVM_ITEST_* env vars). As root, a systemd drop-in (in /run, removed on teardown) points aleph-vm-controller@ at this source tree so persistent VMs run the code under test. Venv runs get the distro dbus/nftables modules through a symlink shim instead of all of dist-packages on PYTHONPATH. - Creation: Firecracker boots to the vsock ready signal; QEMU instances get an IP and answer with an SSH banner from the host; idempotent re-create and ALREADY_EXISTS conflicts; get_vm/list_vms/get_vm_spec agreement. - Management: journald logs via get_logs/stream_logs, ephemeral reboot by recreation, watch_events transitions, port-forward round trip against the live nftables ruleset, QEMU stop/start/reboot with reachability after each transition. - Resource release: hypervisor processes, execution-root files, TAP interfaces, firewall rules and controller units all gone after delete; repeated create/delete cycles under one vm_id accumulate nothing. - Backups: full cycle on a live instance - backup, mutate guest state over SSH, download + verify the archive stream, restore and observe the mutation undone, delete; Firecracker rejection asserted over the wire.

…th, delete leftovers Three product gaps found by the integration suite on a real host: - QEMU exits at startup when EXECUTION_ROOT makes its control socket paths exceed sun_path's 108 bytes ('UNIX socket path is too long'), and the failure was invisible (see below). settings.check() now rejects roots that cannot fit <exec>/<64-hex>-monitor.socket. - wait_for_controller_ready returned as soon as the unit sampled 'active', but a Restart=on-failure unit whose process dies right away (the exact symptom of the socket-path failure) is 'active' between crashes, so create_vm reported RUNNING for a VM that never existed. The wait now re-checks 2s later and fails loudly on a flapping unit. - delete_vm left the controller config and the cloud-init seed behind (caught by the resource-release test). Delete now removes them; stop_vm keeps them, reattach and StartVm read them.

The root-mode session root moved to /var/lib/avm-<rand>/e: the previous paths (both the pytest /tmp root and /var/lib/aleph-vm-itest-<rand>/exec) made the qemu control socket paths exceed 108 bytes, so qemu died at startup on every QEMU test while the flapping unit still sampled as active. A guard asserts the socket paths fit before starting the daemon.

foxpatch-aleph

Large, well-structured PR splitting the agent and supervisor into separate processes over gRPC. The implementation is thorough with proper error translation, idempotent operations, concurrent access handling, and comprehensive test coverage (769 tests). Protocol vocabulary has been carefully cleaned up. No correctness or security issues identified.

src/aleph/vm/orchestrator/vm/program_client.py (line 226): The await asyncio.sleep(0) after process.terminate() is likely insufficient to let the process exit. Consider await asyncio.sleep(0.5) as the first wait, or use process.join(timeout=2) if the forked process supports it.

src/aleph/vm/conf.py (line 477): The sun_path check says "UNIX socket paths are limited to 108 bytes" (total including NUL) but asserts longest_socket_path <= 107 (path only). This is correct, but the error message could be clearer: "108 bytes total including NUL terminator; longest user path is 107".

src/aleph/vm/models.py (line 740): The 2-second sleep before confirming 'active' is correct for crash-loop detection but adds delay for every VM start. Consider whether a shorter interval (e.g., 0.5-1s) would suffice - systemd state transitions are near-instantaneous for a crash-loop but the 2s window is conservative.

VmExecution.stop() queued the systemd stop job and immediately deleted the TAP interface while qemu was still performing the guest's graceful ACPI shutdown. qemu's tap file descriptor goes bad (TUNSETVNETHDRSZ ioctl fails), it aborts without flushing the qcow2, and the next boot kernel-panics on the corrupted image. stop() now waits for the controller unit to reach inactive/failed before teardown, bounded a little past systemd's TimeoutStopSec. Two related gaps on the same paths: - restart_persistent_vm returned right after RestartUnit, so StartVm and RestoreBackup could report BOOTING (or RUNNING on a unit that was about to crash-loop). It now awaits wait_for_controller_ready and stamps started_at. - qemu never unlinks its UNIX control sockets; delete_vm now removes the -monitor/-qmp/-qga .socket files with the rest of the VM's on-disk artifacts. Found by the supervisor integration suite (stop/start cycle, restore cycle and delete resource-release tests).

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

…panics The Ubuntu 26.04 kernel attempts a kexec-handover (KHO) scratch reservation at every boot. On a 768 MiB guest it reserves ~550 MiB, leaving ~72 MiB; PID 1 then OOMs and the kernel panics with "System is deadlocked on memory" before the network comes up. Whether the reservation succeeds depends on the KASLR-randomized kernel placement, so 768 MiB guests booted or panicked at random (the 240s SSH banner timeouts in test_backup_restore_cycle and test_qemu_stop_start_reboot_cycle). At 2048 MiB the reservation always fits (~650 MiB) and the guest keeps 1.2 GiB available: verified with 4 boots at each size, all deterministic.

foxpatch-aleph

This PR implements a major architectural change — splitting the monolith into agent and supervisor processes connected via gRPC over a Unix domain socket. The design is clean, the error vocabulary is well-defined, and the strangler pattern (InProcessSupervisor implements the same contract as GrpcSupervisor) is correctly executed. Test coverage is comprehensive with conformance tests, wire-format round-trips, event streaming, and integration tests. I found no correctness bugs, no security issues, and the code quality is high. I have several comments but none are blockers.

… reboot The persistent branch of reboot_vm called systemd restart (which only queues the job) and sampled the unit state immediately, so a healthy reboot was reported as BOOTING. Await wait_for_controller_ready like start_vm and restore_backup do, and refresh times.started_at since a reboot is a new start.

foxpatch-aleph

Well-structured PR that cleanly splits agent and supervisor into two processes with a well-designed gRPC contract. Strong engineering across the board: clear proto vocabulary, thorough error translation, comprehensive testing (unit + integration), and good documentation. Minor observations only: an unnecessary asyncio.sleep(0) in program_client.forget(), a removed noqa suppression in errors.py, and the unbounded event queue should be noted for future consideration if high-frequency events are added.

src/aleph/vm/orchestrator/vm/program_client.py (line 258): await asyncio.sleep(0) after process.terminate() does not yield enough time for the process to actually die — the sleep(0) just passes control back to the event loop for one iteration. This means the code always falls through to the 2-second sleep branch. Consider either removing the sleep(0) entirely, or making it a meaningful wait like 0.5.

src/aleph/vm/supervisor/errors.py (line 140): The removal of the # noqa: BLE001 comment leaves a bare except clause. While the translating_errors context makes the intent clear, a linter configured with BLE001 enabled will now flag this.

src/aleph/vm/orchestrator/supervisor.py (line 179): The watch_supervisor_events async generator is driven over an unbounded asyncio.Queue inside InProcessSupervisor. Today this is fine because lifecycle events are human-scale. If spontaneous guest-death detection is added in the future (making events high-frequency), consider adding max-size or backpressure to the queue.

src/aleph/vm/supervisor/inprocess.py (line 604): Excellent — the InProcessSupervisor is growing beyond a thin pool wrapper into the canonical boundary implementation. The backup state management with disk-based source of truth (for completed archives) and in-memory only for in-flight/failed is the right balance.

systemd garbage-collects a unit once it reaches a clean inactive state, so wait_for_controller_stopped polling at 1s intervals often misses the brief "inactive" window: GetUnit then raises NoSuchUnit, which get_service_active_state mapped to "unknown", and the loop burned the full 75s timeout on every persistent stop (visible as a ~75s hole between Stopped and the next Started in the integration runs). Map NoSuchUnit to a distinct "not-loaded" state and accept it as stopped; other D-Bus errors remain "unknown" and keep being retried so a transient failure cannot trigger an early network teardown.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

Closed grpc.aio channels can be trapped in reference cycles (exception chains keep frames alive, frames keep the client alive), so the last ones are only freed by the GC pass that runs during interpreter shutdown. On Python 3.14 AioChannel.__dealloc__ then tries to join grpc's poller thread, which finalization forbids, printing an ignored PythonFinalizationError traceback after an otherwise green run. Run one cyclic GC pass in pytest_sessionfinish so the channels are collected while the interpreter is still alive. Reproduced and verified standalone: a closed channel held in a cycle prints the exact traceback at exit, and a gc.collect() before exit silences it.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

…r visible The backup tar was created directly at its final <backup_id>.tar path, and the presence of that file is what get_backup_status, list_backups, download_backup and find_existing_backup all take as backup completion. During the multi-hundred-MB tar write, a status poll therefore reported COMPLETE with whatever size the growing file had at that instant. Caught by test_backup_restore_cycle: get_backup_status returned COMPLETE with size_bytes=569361920 while the writer was still going, and the later download streamed the real 861091840-byte archive. find_existing_backup had the same hazard: a second StartBackup could adopt a half-written tar as a valid existing backup. Write the tar to a .tar.partial staging name and rename it into place after the sha256 sidecar and metadata are written, mirroring what _restore_rootfs_sync already does for the rootfs swap. A failed run removes the staging file and leaves no .tar behind. cleanup_expired_backups now also removes .tar.partial files older than the TTL (a writer that died mid-backup); fresh ones are left alone.

foxpatch-aleph

A substantial, well-structured refactor that introduces a process-split architecture between agent and supervisor over gRPC. The design is clean — message-agnostic DTOs, closed error vocabulary, clear boundary between supervisor (VMM boot, disk lifecycle, backup) and agent (Aleph message parsing, guest protocol, configuration push). However, there are several correctness and safety issues that should be addressed before merging.

src/aleph/vm/supervisor/grpc_client.py (line 108):

src/aleph/vm/orchestrator/vm/program_client.py (line 208):

src/aleph/vm/supervisor/inprocess.py (line 626):

src/aleph/vm/supervisor/inprocess.py (line 747):

src/aleph/vm/supervisor/inprocess.py (line 658):

src/aleph/vm/supervisor/inprocess.py (line 766):

src/aleph/vm/supervisor/grpc_server.py (line 94):

src/aleph/vm/supervisor/inprocess.py (line 680):

src/aleph/vm/supervisor/inprocess.py (line 699):

src/aleph/vm/supervisor/inprocess.py (line 771):

src/aleph/vm/supervisor/inprocess.py (line 406):

src/aleph/vm/supervisor/proto_convert.py (line 168):

src/aleph/vm/supervisor/inprocess.py (line 589):

tests (line ):

src/aleph/vm/supervisor/inprocess.py (line 96):

src/aleph/vm/supervisor/grpc_client.py (line 74):

src/aleph/vm/supervisor/inprocess.py (line 298):

New tests-integration job in the pytest workflow, parallel to the unit tests. It provisions the same artifacts the unit job already uses (Firecracker v1.15.1 + guest kernel, the debootstrapped aleph-debian-12-python runtime) plus the Ubuntu 24.04 cloud image for the QEMU tests (LTS rather than the 26.04 devel image for stability; the suite takes the image via AVM_ITEST_QEMU_IMAGE), then runs tests/integration as root with AVM_ITEST=1 so the QEMU, TAP networking, port forwarding and backup/restore tests all execute. Guest DNS is pinned to public resolvers like in the droplet CI: the runner's systemd-resolved on 127.0.0.53 is unreachable from inside a VM. On failure the job dumps the controller unit journal and the guest console lines for debugging.

foxpatch-aleph

A sound, well-structured architecture split that decomposes the monolithic agent+supervisor into two communicating processes over a gRPC/Unix-socket boundary. The design cleanly separates concerns (agent owns networking/messages/user policy, supervisor owns VMMs/systemd/backups), uses a closed error vocabulary with reliable cross-process error propagation via gRPC trailers, and maintains backward compatibility through a strangler pattern. The proto contract is carefully designed with same-host invariants explicitly documented. Backup operations are fully implemented with fsfreeze support. Tests show 769 passed. Minor issues noted below.

src/aleph/vm/supervisor/grpc_server.py (line 77):

src/aleph/vm/supervisor/grpc_server.py (line 193):

src/aleph/vm/supervisor/grpc_client.py (line 229):

src/aleph/vm/supervisor/grpc_client.py (line 139):

src/aleph/vm/orchestrator/run.py (line 237):

src/aleph/vm/orchestrator/run.py (line 398):

src/aleph/vm/orchestrator/run.py (line 555):

src/aleph/vm/orchestrator/run.py (line 253):

src/aleph/vm/supervisor/inprocess.py (line 459):

src/aleph/vm/supervisor/inprocess.py (line 698):

src/aleph/vm/supervisor/inprocess.py (line 758):

proto/supervisor.proto (line 498):

proto/supervisor.proto (line 11):

src/aleph/vm/supervisor/grpc_server.py (line 65):

src/aleph/vm/supervisor/errors.py (line 91):

src/aleph/vm/supervisor/grpc_client.py (line 99):

src/aleph/vm/supervisor/daemon.py (line 67):

src/aleph/vm/supervisor/daemon.py (line 51):

src/aleph/vm/orchestrator/supervisor.py (line 163):

src/aleph/vm/orchestrator/supervisor.py (line 211):

src/aleph/vm/supervisor/grpc_client.py (line 155):

src/aleph/vm/orchestrator/supervisor.py (line ):

src/aleph/vm/supervisor/inprocess.py (line 745):

src/aleph/vm/controllers/qemu/backup.py (line None):

src/aleph/vm/orchestrator/run.py (line 122):

src/aleph/vm/orchestrator/run.py (line 349):

src/aleph/vm/supervisor/inprocess.py (line 402):

src/aleph/vm/pool.py (line None):

src/aleph/vm/controllers/configuration.py (line None):

src/aleph/vm/supervisor/proto_convert.py (line 1):

src/aleph/vm/orchestrator/run.py (line 38):

src/aleph/vm/supervisor/inprocess.py (line 626):

src/aleph/vm/supervisor/inprocess.py (line 714):

src/aleph/vm/supervisor/errors.py (line 133):

src/aleph/vm/supervisor/grpc_server.py (line 80):

src/aleph/vm/supervisor/grpc_server.py (line 114):

src/aleph/vm/supervisor/inprocess.py (line 489):

src/aleph/vm/orchestrator/vm/program_client.py (line 180):

src/aleph/vm/orchestrator/vm/program_client.py (line 70):

src/aleph/vm/orchestrator/vm/program_client.py (line 76):

src/aleph/vm/supervisor/inprocess.py (line 312):

src/aleph/vm/orchestrator/supervisor.py (line 207):

The settings check added in 6dfb76b rejected execution roots whose longest QEMU control socket path (<root>/<64-hex>-monitor.socket) exceeds 107 bytes. The real limit is 108: sun_path is char[108] and QEMU fills it exactly without a trailing NUL, only failing on 109+ (verified against qemu-system-x86_64: 108 binds, 109 dies with 'Path must be less than 108 bytes'). Python clients do need room for a NUL (107 max) but only ever connect to the -qmp.socket, whose suffix is 4 bytes shorter, so the bound path remains the binding constraint. The off-by-one broke test_create_qemu_instance(_online) on CI, where tempfile gives /tmp/alephtest_XXXXXXXX/exec: exactly 108, which booted fine for years. Relax the assert to <= 108 and align the integration conftest's mirror of it.

foxpatch-aleph

This PR implements Phase 1 of the agent/supervisor process split — a substantial architectural refactor that introduces a clean gRPC boundary between the two processes. The design is contract-first with a well-defined proto, frozen-dataclass DTOs, and a capability-based ABC interface. Error handling is carefully designed with class-exact round-trip via ErrorDetail trailers and mapped gRPC status codes. The strangler pattern is correctly applied: the in-process path remains fully functional and the gRPC path is introduced alongside it, selectable via the SUPERVISOR_GRPC_SOCKET setting. Backward compatibility is preserved throughout; existing message-driven code paths are untouched. New code is well-structured, documented, and follows established patterns. The ProgramGuestClient correctly moves agent-side guest protocols (guest API, configuration push, code execution) across the boundary. The InProcessSupervisor now serves as the reference implementation for the contract, with added lifecycle methods and event emission. The proto file cleanly separates concerns with reserved fields replacing removed concepts like 'is_instance' and 'BACKEND_QEMU_SEV'. The main risk areas (deadlines on all unary calls, stream cancellation semantics, identity-based forget-on-stop fix) are addressed correctly. Testing was not verified but the code structure supports testability through the ABC abstraction.

On CI every supervisor stop_vm/delete_vm was a silent no-op for QEMU VMs: qemu stayed alive until the session teardown, where systemd's stop-sigterm timeout SIGKILLed it, failing three integration tests (stop/start cycle, delete release checks, restore reachability). Root cause chain: - stop_and_disable only stopped the unit when is_service_active was true, and is_service_active first requires GetUnitFileState == enabled. The integration harness's fallback controller unit had no [Install] section, so the instance could never be enabled, the check returned false for a running unit, and the stop was skipped with no error. Locally the packaged template (which has [Install]) masked this. Active does not imply enabled: gate the stop on the unit's actual ActiveState instead. - The fallback unit's TimeoutStopSec=30 was shorter than the controller's 50s ACPI grace window, so a guest that needed escalation would be SIGKILLed mid-shutdown; align with the packaged unit's 60 and add the [Install] section so enable/disable behave as in production. Also filter the guest console forwarding out of the CI unit-journal dump: it drowned the 500-line tail, hiding the stop-sigterm timeout lines that pinpointed this.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

…events and host info Five coverage additions to the supervisor integration suite: - Daemon restart (the recovery story): a persistent QEMU VM must survive a supervisor daemon SIGTERM + relaunch with the same qemu process and stay reachable, the new daemon must reattach it as RUNNING via load_persistent_executions, and stop/start/delete must work end to end on the reattached execution. The conftest daemon fixture now records its spawn environment so restart_daemon can relaunch the exact same daemon and wait for it to answer health. - Extra data disks: a DiskRole.EXTRA qcow2 is visible in the guest (located by size, not device-name guessing), data written to it survives a stop/start cycle, and delete(wipe=True) erases the writable volume file on the host. - Error paths over the real gRPC boundary: VmNotFoundError from every lookup surface, NotImplementedSupervisorError for stop/start of ephemeral VMs (which must leave the VM running and deletable), BackupNotFoundError for unknown and foreign backup ids, and VmNotFoundError on double delete. - Event stream breadth: the FC test now covers create + reboot (down-then-up pair) + delete fanned out to two concurrent subscribers; the persistent QEMU cycle asserts stop/start/reboot all reach the stream in order. - Smaller surfaces: get_host_info sanity, UDP port forwards (parametrized with TCP), and IPv6 guest reachability in the QEMU create test (the address is statically pushed via cloud-init, so it must answer on port 22 like IPv4 does).

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

ExportVm, ImportVm and GetMigrationStatus were NotImplemented stubs; the wire layer (proto, gRPC server/client, conversions) already existed. This implements the directory-based contract: export dumps a VM into a host-local directory, import recreates it from one, and moving the directory between hosts is the caller's business (the agent or an external orchestrator; migrations are exercised end to end in aleph-testnets). Export (async job, poll via GetMigrationStatus): requires a persistent spec-built QEMU VM, takes the same per-VM lock as backup/restore, stops the VM through the regular stop path (the disks must be quiescent, and the VM is leaving this host), converts qcow2 disks with qemu-img convert -c (collapsing any backing chain so the export is self-contained) and copies other formats verbatim, then writes a manifest recording the spec and per-disk sha256. Progress is reported through bytes_transferred/bytes_total. A failed export flips the job to FAILED and restarts the VM. Import (synchronous): validates the manifest, refuses ids that are already registered before touching any disk (an import onto a live VM must not overwrite the disks it runs from), verifies each disk's sha256, copies them under PERSISTENT_VOLUMES_DIR/<vm_id>, rewrites the spec disk paths and boots through the regular create path (conflict guards and events included). Half-copied state is removed on failure. The manifest serializes the spec through its proto encoding (CreateVmRequest as JSON): the proto already is the cross-version schema, so the manifest inherits its compatibility story instead of inventing a second serialization. New MIGRATION_NOT_FOUND error code end to end (proto enum, ErrorCode, MigrationNotFoundError, status/class maps) for unknown migration ids and directories without a manifest; the proto bindings are regenerated. The stub-conformance lists shrink to the confidential trio. 17 new unit tests cover the manifest roundtrip and validation, export success/failure/concurrency, import path rewriting, corruption and conflict guards, and status lookups.

foxpatch-aleph

Well-structured PR that drives the agent/supervisor process split through gRPC with clean proto vocabulary, thorough error round-trips (class-exact ErrorDetail trailers), and careful concurrency management (per-VM creation locks, stop_event defusing, identity-based forget-on-stop fix). The program path is correctly split: agent downloads code/resources and emits path-only specs; supervisor boots from those paths. Tests are comprehensive (wire round-trips, error translation, streaming, split-mode wiring). Minor observation: persist_record in _ensure_program_vm runs after setup_program, so a DB write failure could leave a running VM orphaned from the registry — but this is a minor edge case without security implications, as the VM still serves correctly in memory.

src/aleph/vm/orchestrator/run.py (line 423): If persist_record fails after setup_program succeeds, the VM is running and serving but not recorded in the agent's DB. No cleanup path handles this specific failure. Consider moving persist_record before setup_program, or wrapping in a try/except that at least logs the failure.

src/aleph/vm/orchestrator/run.py (line 39): The comment in _is_spec_eligible says "Keep the two in sync" regarding build_create_vm_spec, but this coupling is fragile. Consider a shared validation function or having build_create_vm_spec itself gate the caller — or at minimum, have a test that asserts the two agree on all inputs.

src/aleph/vm/supervisor/grpc_server.py (line 176): WatchEvents, StreamLogs, and DownloadBackup have manual exception handling instead of @_translating — this is the correct pattern for async generators where the decorator can't work. Consider extracting the boilerplate (the try/except pattern is identical across all three) into a shared helper to reduce copy-paste.

src/aleph/vm/supervisor/grpc_server.py (line 95): The raise AssertionError("abort() must raise") is technically reachable if _abort is called and context.abort() does not raise (e.g. if the gRPC implementation changes). Consider adding # pragma: no cover since it's a defensive guard.

…py and host-info asserts Four fixes from the first CI execution of the new coverage: - test_backup_restore_cycle now takes the backup with quiesce_guest=True. The unquiesced qemu-img convert -U copy of a live disk is crash-consistent at best, and the restored guest failed to boot from it twice in a row on the slower CI runners (locally the restore boot already logged a cloud-init fatal error without failing the test). fsfreeze through the guest agent is how a backup of a running VM is meant to be taken. - make_data_disk now creates a raw image: the controller attaches host volumes without a format, so qemu treats the file as a raw device. A qcow2 exposed its 197 KiB container bytes to the guest instead of a 64 MiB disk, and the device-by-size lookup found nothing. - get_host_info: cpu_architecture/vendor/model are best-effort and legitimately empty on GitHub's Azure runners; assert only fields every host populates. - Annotate the events test's streams list: CI's mypy gate covers tests/ (the local run only checked src/), and it failed on the missing annotation. The failure-only journal dump now prints each guest's console tail separately: a single global tail only showed the last VM, hiding the restored guest whose boot actually failed.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

The resource-release tests sampled a count baseline of daemon-child hypervisor processes, then asserted create added exactly one and delete returned to the baseline. That baseline is contaminated when a previous firecracker test is still mid-teardown: delete_vm returns before the host reaps the firecracker process, so a neighbour's dying VM is counted in the baseline and the create assertion (baseline + 1) fails once that process is finally reaped. Observed locally as 'assert 1 == (1 + 1)'. Track the PIDs this VM actually spawns via set difference against a pre-create snapshot, so a neighbour's teardown lag cannot affect the result.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

odesenfans added 10 commits June 11, 2026 10:57

docs: design for the gRPC transport + program path + process split (d…

f341e4a

…ev-accelerate)

docs: record the successful live e2e through the split

58c17e8

odesenfans force-pushed the dev-accelerate branch from d3dda23 to 58c17e8 Compare June 11, 2026 08:58

odesenfans added 11 commits June 11, 2026 14:02

fix(supervisor): stamp live log chunks at capture time

33feeb2

Live StreamLogs lines carried timestamp_ns=0 as a sentinel, which clients rendered as the 1970 epoch. The contract says timestamp_ns is server-side capture time; now it is.

fix(agent): type-ignore aiohttp Signal stubs on event-watcher hooks

006727b

mypy parity with the base branch restored (the stubs reject valid async (Application) -> None handlers when app is precisely typed).

docs: protocol review follow-through in the split design doc

86ded81

foxpatch-aleph approved these changes Jun 11, 2026

View reviewed changes

fix(tests): annotate the fake app dict for mypy

2b5c3ca

CI type-checks tests/ too; the mixed-value dict inferred as dict[str, object] rejected attribute access on the mocks.

foxpatch-aleph approved these changes Jun 11, 2026

View reviewed changes

odesenfans added 3 commits June 11, 2026 16:59

odesenfans added 2 commits June 12, 2026 00:05

foxpatch-aleph approved these changes Jun 11, 2026

View reviewed changes

foxpatch-aleph reviewed Jun 11, 2026

View reviewed changes

foxpatch-aleph reviewed Jun 12, 2026

View reviewed changes

foxpatch-aleph approved these changes Jun 12, 2026

View reviewed changes

odesenfans added 2 commits June 12, 2026 17:53

chore(tests): print the 10 slowest tests after integration runs

a386c84

foxpatch-aleph reviewed Jun 12, 2026

View reviewed changes

foxpatch-aleph approved these changes Jun 12, 2026

View reviewed changes

foxpatch-aleph reviewed Jun 12, 2026

View reviewed changes

foxpatch-aleph approved these changes Jun 12, 2026

View reviewed changes

foxpatch-aleph reviewed Jun 12, 2026

View reviewed changes

foxpatch-aleph reviewed Jun 13, 2026

View reviewed changes

Conversation

odesenfans commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

Changes

Out of scope (501 / skipped in split mode, consistent with the series)

Test plan

Protocol review follow-through (second pass)

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

odesenfans commented Jun 11, 2026 •

edited

Loading

codecov Bot commented Jun 11, 2026 •

edited

Loading