feat(supervisor): Phase 2 - gRPC wire protocol + two-service packaging by odesenfans · Pull Request #981 · aleph-im/aleph-vm

odesenfans · 2026-06-17T14:25:58Z

Phase 2: complete the gRPC wire protocol + two-service packaging

Stacked on #980 (Phase 1). Base branch is od/grpc-only-supervisor; review/merge #980 first. Diff here is Phase 2 only.

Phase 1 (#980) decoupled the agent from the pool through the Supervisor
interface, wired to the embedded LocalSupervisor. Phase 2 makes the same
interface work over gRPC and ships the two-process split. Design:
docs/plans/2026-06-16-grpc-only-supervisor-design.md; plan:
docs/plans/2026-06-17-grpc-only-supervisor-phase2-implementation.md.

Part A — complete the wire protocol

Wired the 4 remaining client stubs over gRPC (each had no proto RPC):
recreate_network, restore_from_image, run_program_code (msgpack-framed
scope), reserve_resources. No NotImplementedError("wired in Phase 2")
remains (guard test enforces this).
Carried 8 silently-dropped fields: TeeConfig.firmware_path,
GpuSpec.device_id/model, CreateVmSpec.owner_address,
Measurement.sev_info+launch_measure (new SevInfo message),
BackupInfo.checksum/volumes/source_sizes, StartBackup.include_volumes.
Reconciled HostInfo (hardware + reservation/inventory fields).
reserve_resources carries a message-free ReservationRequest DTO — the
agent translates the Aleph message; no aleph_message type crosses the wire
(verified: zero in proto_convert/grpc_server/grpc_client/types/abc).
Pool gained a shared check_capacity (unifying the message/spec/reserve paths)
and a reserve_gpus keyed on device_id; orphaned message-based
pool.reserve_resources removed.
Removed the orphan migration RPCs (ExportVm/ImportVm/GetMigrationStatus).
Guard tests: no stubs remain; every RPC has a server handler.

Part B — two-service packaging

aleph-vm-supervisor.service becomes the daemon (python -m aleph.vm.supervisor,
owns the pool, serves gRPC); new aleph-vm-agent.service runs the HTTP
orchestrator and dials the socket. postinst migrates the old single unit
(stop → daemon → wait for socket → agent); running VMs survive in
controller@ units and the daemon reattaches.
supervisor.env sets ALEPH_VM_SUPERVISOR_GRPC_SOCKET, so production is
gRPC-split while the code default stays None (dev/tests embedded).

Part C — confinement + validation

build_supervisor factory guard (embedded when socket unset, gRPC when set).
Validated end to end: the real daemon over a real Unix socket returns
HealthInfo(OK) through a real GrpcSupervisor.

Testing

910 passed (only the pre-existing root/env failures: test_execution,
test_interfaces); ruff/isort clean; mypy clean on supervisor/ + pool.py;
_pb/ regenerates with no diff.

Known follow-ups (out of Phase 2 scope)

pool.py still imports aleph_message for the legacy internal create path, so
the daemon process pulls it transitively (the wire is message-free); fully
purging it is a later cleanup.
Now-unused types.py MigrationInfo/MigrationPhase/MigrationId DTOs left in place.
Testnet two-process validation (design C3) pending a deployed branch.

Two-process split as the only supported deployment: a supervisor daemon owning the pool and serving the Supervisor interface over gRPC, and an agent that reaches VMs exclusively through that interface. Key decisions: - rename InProcessSupervisor -> LocalSupervisor (the pool-backed engine) - agent keeps a thin embedded seam for dev/tests only - fold admission + GPU reservation into create_vm (atomic) - HAProxy/L7 domain mapping moves fully to the agent (fed by list_vms), out of the pool; the supervisor stays L3/L4 - interface-first, two phases: decouple agent from pool, then transport and two-service packaging

Decouple the agent from VmPool. P1.1 (rename + wiring) and P1.6a (recreate_network) are ready to build; backup/restore, migration and confidential measurement are blocked on three contract decisions noted in the plan.

Introduce orchestrator/haproxy_sync.py with sync_domain_mappings(supervisor), which derives each local VM's IP from VmInfo.ipv4.address and pushes the domain->IP map to HAProxy. Add build_map_entries_from_vm_ips and write_entries_to_backend to haproxy.py as the low-level building blocks.

Replace pool.update_domain_mapping() calls in the orchestrator views and tasks with sync_domain_mappings(supervisor): regenerate_proxy, notify_allocation, operate_update, the periodic resync task, and the domains-aggregate handler. The supervisor's list_vms() output now drives HAProxy, not the in-process pool.

Delete VmPool.update_domain_mapping and its load-time call; the agent now seeds an initial forced sync in periodic_domain_resync at startup. Drop the dead instances-based helpers from haproxy.py (fetch_list_and_update, update_backends, _build_map_entries, _resolve_vm_ip), keeping fetch_list and the low-level socket/map helpers. Add a guard test.

…tial)

Add a SevInfo dataclass mirroring the seven QEMU query-sev fields and carry it (plus the base64 launch_measure) on Measurement, defaulted so existing callers and the proto path keep working. This lets the confidential measurement endpoint preserve its current response shape once the logic moves into the engine.

Move the initialize/measurement/inject-secret logic out of the agent endpoints and into the engine: write the SEV session certificates and start the controller service, query the launch measurement plus the seven query-sev fields, and inject the secret then resume the guest. The confidential ops are no longer stubs, so drop them from the conformance STUB_METHODS set.

Rewire operate_confidential_initialize, _measurement and _inject_secret to call only the Supervisor interface: the initialize guards now read supervisor.get_vm (status + confidential_mode) instead of the pool, and the three endpoints no longer touch require_vm_pool, the pool or QemuVmClient. The measurement response is preserved byte-for-byte as {"sev_info": {7 fields}, "launch_measure": ...}; inject_secret now returns {"status": "ok"} since the void supervisor method cannot carry the post-injection QMP status (the agreed minor contract change).

…ot a hardcoded SEV The TEE generation is determined by the VM's confidential configuration (an input the supervisor already holds via the execution policy), so get_measurement now derives tee_backend through _confidential_mode instead of hardcoding TeeBackend.SEV. TeeBackend.SEV still covers SEV and SEV-ES (refined by sev_info.policy); SEV-SNP maps distinctly.

Grow the supervisor backup interface so the agent can drop its pool-backed backup/restore logic without any feature regression: - BackupInfo carries checksum, volumes and source_sizes for completed archives, so the agent builds the HTTP body and download sidecar headers from engine metadata. - start_backup gains include_volumes: the engine archives the VM's non-read-only persistent volumes alongside the rootfs. - restore_from_image(vm_id, image_path, max_virtual_size_bytes) restores from a QCOW2 already staged on a host path (uploaded image or downloaded volume): the engine validates, size-checks, swaps the rootfs and restarts. - GrpcSupervisor stubs restore_from_image NotImplementedError (wired in Phase 2).

Rewire the five backup/restore endpoints off the VmPool: they now reach VM and disk work only through the Supervisor interface, keeping HTTP-only concerns (owner auth, presigned download URLs and their verification, multipart/JSON parsing, staging uploaded bytes, sidecar response headers). - operate_backup calls start_backup(quiesce_guest, include_volumes): 202 while RUNNING, metadata body (sourced from BackupInfo) plus a presigned URL when COMPLETE; InvalidBackend -> 400, InsufficientResources -> 507. - operate_backup_status reports via list_backups. - operate_backup_download verifies the presigned URL, streams from download_backup, and sets Content-Length / X-Backup-Checksum / X-Source-Size from the backup metadata. - operate_backup_delete delegates to delete_backup. - operate_restore stages the uploaded QCOW2 or the volume_ref download to a temp path, then calls restore_from_image; qemu-img rejection and oversized disks surface as 400. Drops the agent-side BackupState background-task tracking, the inline QemuVmClient/fsfreeze logic, require_vm_pool and pool reads. Adds a guard test asserting the five endpoints reference no pool symbols.

The test exercised the removed operator._do_restore / _parse_restore_upload internals. Rewire it through operate_restore against a supervisor whose restore_from_image raises InvalidBackendError (what the engine's qemu-img check does on a non-QCOW2), still asserting the 4xx (not 500).

The agent's P2P pull protocol keeps the network transport (compress, hash, serve, download, rebase); add the four MigrationOps methods that own the parts touching the VmExecution and the pool, which the agent must not reach into directly: - stop_vm_for_export: graceful stop + locate the persistent-volumes dir - restart_after_failed_export: bring the VM back up after a failed export - create_migrated_vm: create a persistent VM from a staged instance message - release_migrated_vm: stop + forget a VM that has migrated away Implemented for real in LocalSupervisor, stubbed NotImplementedError in GrpcSupervisor (wired in Phase 2). Surface count 32 to 36.

Thread the supervisor (not the VmPool) through the migration runner and the three migration endpoints. The P2P pull protocol stays the agent-side network transport: the on-host HTTP disk transfer, tokens, ExportJob/ImportJob states and request/response shapes are unchanged. Only the disk/VM work moves behind the supervisor: - migration_export gates via supervisor.get_vm (VmNotFoundError = 404) and reads status/backend/confidential_mode off the returned VmInfo; run_export asks the supervisor to stop the VM and report its volumes dir, then the agent compresses and serves the files it finds there. - migration_import gates the already-running check via supervisor.get_vm; run_import downloads and rebases the peer disks (network transport) then calls supervisor.create_migrated_vm. The dest-dir rmtree decision is made against a single supervisor.get_vm probe instead of pool.executions. - migration_cleanup releases the source VM via supervisor.release_migrated_vm (engine: stop_vm + forget_vm) instead of pool.stop_vm/forget_vm. The three endpoints (and the runner) now contain no require_vm_pool / vm_pool / pool.create_a_vm / pool.executions references; a guard test asserts this via inspect.getsource.

Remove the provisional directory-based export_vm / import_vm / get_migration_status from the MigrationOps ABC, LocalSupervisor, GrpcSupervisor and the gRPC server handlers, and the migrate.py manifest module they relied on. Migration now rides the standard lifecycle RPCs; only the export-time graceful stop survives as a MigrationOps method (it needs a guest powerdown stop_vm does not do). The orphan ExportVm / ImportVm / GetMigrationStatus proto RPCs stay in supervisor.proto with a note to drop them in the Phase 2 proto pass, so the generated bindings are unchanged and check_proto_clean stays green. Surface count 36 to 30.

run_import now builds a CreateVmSpec from the fetched instance message with build_create_vm_spec (the same path run.create_vm_execution uses for a normal persistent instance) and calls supervisor.create_vm(spec) instead of the bespoke create_migrated_vm. The spec's rootfs path is PERSISTENT_VOLUMES_DIR/<vm_hash>/rootfs.qcow2, exactly where the download+rebase already staged the overlay; build_create_vm_spec adopts an already-present host-persistence overlay rather than recreating it, so create_vm reuses the staged disk with no re-download (covered by test_import_spec_build_reuses_staged_overlay). The failed-export restart rides the standard start_vm RPC instead of restart_after_failed_export. The disk/VM seam methods were removed in the previous commit; this wires the runner onto the lifecycle RPCs.

A migrated instance booted on the destination but never got its host port forwards: create_vm_from_spec only reloads persisted mappings (empty on a fresh destination), and run_import stopped at create_vm. The normal create path runs a post-create tail (wait-until-running + apply the agent's resolved port forwards, always including SSH/22); migration skipped it, so mapped_ports stayed empty, the VM was unreachable, and the testnet migration test timed out waiting for the re-dispatched VM's port 22. Extract that tail into run.finish_instance_create and call it from both the normal create path and run_import, so a migrated instance is wired identically to a freshly created one. Adds a regression assertion that run_import invokes the tail.

…/grpc-only-supervisor-phase2

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

dev advanced only by the #980 squash, whose content phase2 already carries via the original commits. -s ours keeps phase2's tree byte-for-byte and records dev as a parent so #981 presents a clean Phase-2-only diff against dev.

foxpatch-aleph

Well-structured Phase 2 implementation that completes the gRPC wire protocol. The four remaining stubs are now wired, all previously-dropped fields are carried, migration RPCs are removed, HostInfo is reconciled, and the two-service packaging is implemented. Test coverage is thorough with round-trip, guard, and unit tests. No correctness or security issues found.

CI now runs mypy on tests/ for this branch (it never did while #981 targeted a non-dev base). FakePool's check_capacity/reserve_gpus were assigned dynamically in test_reserve_resources_*; declare them so mypy's attr-defined check passes.

foxpatch-aleph

Well-structured PR completing the gRPC supervisor wire protocol and two-service packaging split. All four previously-stubbed methods are fully wired, all eight dropped fields are carried over the wire, the orphan migration RPCs are removed, HostInfo is reconciled, and reserve_resources takes a message-free DTO. The packaging changes (two systemd units, maintainer scripts for migration, socket default) are correct and the test coverage is thorough with guard tests ensuring completeness. No correctness or security issues found.

src/aleph/vm/supervisor/local.py (line 1155): Likely fine but worth double-checking: the old check_admission(content, current_vm_hash=...) accepted an optional current_vm_hash parameter to exclude a specific VM from committed totals (for re-notification). Neither check_capacity nor reserve_resources preserves this parameter — was the current_vm_hash exclusion intentionally removed from the reserve path? Re-notification of a running instance would not normally hit reserve_resources (the message is already known), but if it did, the absence of the exclusion means a re-reserve for an already-running VM could incorrectly fail on memory. The plan doc doesn't mention this parameter, and the test test_check_admission_skips_renotify_of_running_instance was deleted in test_resources.py, so it seems intentional — just confirming the known scope.

codecov · 2026-06-18T00:54:50Z

Codecov Report

❌ Patch coverage is 96.37462% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.30%. Comparing base (ca8e791) to head (3f476a9).

Files with missing lines	Patch %	Lines
src/aleph/vm/orchestrator/views/operator.py	77.77%	3 Missing and 1 partial ⚠️
src/aleph/vm/supervisor/translate.py	70.00%	1 Missing and 2 partials ⚠️
tests/supervisor/test_grpc_roundtrip.py	95.71%	0 Missing and 3 partials ⚠️
...c/aleph/vm/controllers/firecracker/spec_program.py	50.00%	1 Missing ⚠️
src/aleph/vm/supervisor/grpc_server.py	95.23%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #981      +/-   ##
==========================================
+ Coverage   80.00%   80.30%   +0.30%     
==========================================
  Files         198      201       +3     
  Lines       23026    23083      +57     
  Branches     1526     1511      -15     
==========================================
+ Hits        18421    18536     +115     
+ Misses       4172     4114      -58     
  Partials      433      433

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The HTTP API (port 4020, /control/*) is served by the agent in the two-service split; the failure handler only dumped the supervisor unit, leaving the agent side (where /control/allocations 503s) invisible. Capture both.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

The after-failure export only captured the supervisor unit; the agent (HTTP API, /control/allocations) journal is where instance-create errors surface in the two-service split.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

The shared capacity-admission path (calculate_available_disk) sums get_disk_usage_delta() over every execution's resources. SpecProgramResources (spec-driven Firecracker programs) never implemented it, so once a spec program was running, any subsequent create_vm crashed with AttributeError and the agent returned 503 on /control/allocations. Spec disk admission is deferred (DiskSpec carries no size yet), so it reserves nothing: return 0, mirroring check_spec_admission's disk_mib=0.

foxpatch-aleph

Completes the gRPC wire protocol by wiring the four remaining stubs (run_program_code, restore_from_image, recreate_network, reserve_resources), carrying eight previously-dropped dataclass fields, reconciling HostInfo, removing orphan migration RPCs, and shipping the two-process systemd split. Clean code, thorough tests, sound architectural decisions.

tests/supervisor/test_grpc_complete.py (line 8): The "NotImplementedError" not in source guard is technically redundant with the "wired in Phase 2" check and broader than necessary — it would also catch genuine NotImplementedError usage in error handling code. Consider removing it and keeping only the "wired in Phase 2" check for a more precise guard.

The droplet test writes its own supervisor.env and installs with --force-confold, which keeps it over the packaged one - so it must carry the two-service socket the package ships (packaging/aleph-vm/etc/aleph-vm/supervisor.env). Without it the agent runs embedded next to the supervisor daemon, two pools fight over the VM tap ('File descriptor in bad state'), and program networking endpoints return 503.

foxpatch-aleph

⚠️ autodev could not generate a review for this PR (empty output from the model).

Click Re-request review to retry.

A confidential instance is created via the allocation path, where create_vm (build config + start to the awaiting-init state) takes ~20s. The scheduler exposes placement earlier, so the owner's one-shot init-session (/confidential/initialize) can arrive mid-create — before create_vm_execution writes the agent registry record — and got a 404 (the testnet confidential test fails here). The pre-refactor path looked the VM up in the pool (registered early) and didn't race. Wait for the record (written exactly when create completes and the VM is awaiting init) up to a bounded cap instead of 404ing.

foxpatch-aleph

Completes the gRPC wire protocol for the supervisor interface (4 new RPCs, 8 previously-dropped fields, removal of orphan migration RPCs), refactors pool capacity admission into a shared check_capacity core, and ships the two-process split with systemd units and maintainer scripts. All stubs are wired, all fields are carried, the embedded path is guarded for dev/test, and the test suite is comprehensive including round-trip gRPC tests over real Unix sockets, guard tests, and pool-level reservation tests.

src/aleph/vm/supervisor/grpc_client.py (line 213): The deadline formula timeout + LIFECYCLE_TIMEOUT_SECS = 300 means every run_program_code call has at least a 300s gRPC ceiling. This is correct — the engine's own timeout fires first — but worth noting for review: the gRPC deadine will be significantly larger than the engine timeout for short-lived requests.

tests/supervisor/test_grpc_complete.py (line 11): The "NotImplementedError" not in source check is slightly too broad — a legitimate NotImplementedError could be added elsewhere in grpc_client.py. The companion test_every_rpc_has_a_handler test at line 14 already covers the real concern exhaustively. Consider dropping or narrowing line 11 to skip this risk.

After the record-wait fix, init-session got past the 404 but hit a 400 'vm_running': the endpoint rejects status in (RUNNING, BOOTING), and a spec-path confidential VM awaiting its owner reports BOOTING (start() sets starting_at without launching the controller). That VM is exactly what init-session initializes, so exempt awaiting_confidential_init from the running check.

foxpatch-aleph

Clean, well-structured Phase 2 implementation completing the gRPC wire protocol (4 new RPCs, 8 dropped fields, HostInfo reconciliation, orphan migration RPC removal), two-service packaging (daemon + agent systemd units with upgrade migration), and thorough confinement/testing. The code is correct, readable, and well-tested. No bugs or security issues found.

src/aleph/vm/supervisor/grpc_server.py (line 67): Using is for enum comparison against ErrorCode.INTERNAL is technically correct (Python enum members are singletons), but == would be more idiomatic and robust against potential module reload edge cases. Consider changing to error.code == ErrorCode.INTERNAL for consistency with the rest of the codebase.

src/aleph/vm/pool.py (line 137): The three parameter-to-variable assignments (required_memory_mib = memory_mib, etc.) are redundant. The original check_admission needed them because it extracted values from a message object (message.resources.memory), but here the parameters can be used directly. Consider inlining to reduce noise.

src/aleph/vm/supervisor/translate.py (line 248): The build_reservation_request function accesses getattr(gpu, "model", "") for the GPU model. The comment notes "the message GPU has no model field (it is a network-derived name) so model stays empty" — but GpuProperties does have a model field in some message versions. Using getattr is safe, but the logic differs slightly from build_create_vm_spec where model is hardcoded to empty string. If a message version starts providing model, the two functions will diverge. Consider noting this intentional asymmetry more explicitly or aligning them.

tests/supervisor/test_grpc_roundtrip.py (line 86): _Fake.__abstractmethods__ = frozenset() works but is fragile — if new abstract methods are added to Supervisor, the _Fake class won't fail at definition time. Consider deriving _ASYNC_METHODS from the ABC's abstract methods via reflection instead of listing them manually. The guard test in test_grpc_complete.py catches missing server handlers but doesn't catch missing async mocks on the fake.

A confidential VM returns awaiting_confidential_init from create, so the normal create-completion (run.finish_instance_create: wait-running + port forwards) is skipped. The VM only boots once the owner injects the secret, after which nothing mapped its host ports — so SSH/22 never appeared and the testnet confidential test timed out waiting for port 22. Reconcile the port forwards in the inject-secret endpoint now that the VM is running, mirroring the normal create path.

foxpatch-aleph

Comprehensive Phase 2 implementation that completes the gRPC wire protocol for all Supervisor methods, removes orphan migration RPCs, carries previously-dropped fields over the wire, refactors the pool's capacity admission and GPU reservation into message-free primitives, ships a two-service systemd split with maintainer scripts for clean upgrade migration, and has thorough test coverage including guard tests and round-trip tests. The code is correct, well-structured, and properly secured.

packaging/aleph-vm/DEBIAN/postinst (line 30): The socket-wait loop doesn't log a warning on timeout, which could make post-upgrade failures hard to diagnose. Consider adding a diagnostic log message after the loop exits without finding the socket.

tests/supervisor/test_grpc_complete.py (line 12): test_no_phase2_stubs_remain is fragile — it scans for the string "NotImplementedError" in the entire grpc_client.py source. If any legitimate NotImplementedError is ever added, this test will falsely fail. Consider only checking for the specific "wired in Phase 2" pattern, or remove this guard entirely and rely on test_every_rpc_has_a_handler which is more targeted.

odesenfans added 30 commits June 16, 2026 14:07

docs: Phase 1 implementation plan for gRPC-only supervisor

6b88ba5

Decouple the agent from VmPool. P1.1 (rename + wiring) and P1.6a (recreate_network) are ready to build; backup/restore, migration and confidential measurement are blocked on three contract decisions noted in the plan.

Rename InProcessSupervisor to LocalSupervisor (module local.py)

ec08964

Update production references to LocalSupervisor

a44eb3a

Update supervisor tests to LocalSupervisor

f93f5fe

Extract build_supervisor factory for explicit agent wiring

6196c34

Remove throwaway rename smoke test

d7871fe

Add recreate_network to the Supervisor interface

cac0b67

Implement recreate_network in LocalSupervisor

2f30a7d

Route recreate_network endpoint through the supervisor

6221648

docs: record resolved contract decisions (backup, migration, confiden…

31c473f

…tial)

Add check_spec_admission for the spec create path

a7663a8

Fold capacity admission into create_vm_from_spec atomically

9c3c625

Remove agent-side admission in notify_allocation; surface boundary 503

ffdb435

Route reserve_resources through the Supervisor interface

69d4128

test: stage volume_ref restore under tmp_path

2e97558

odesenfans added 2 commits June 18, 2026 01:25

Merge remote-tracking branch 'origin/od/grpc-only-supervisor' into od…

bb77857

…/grpc-only-supervisor-phase2

odesenfans deleted the branch dev June 18, 2026 00:22

odesenfans closed this Jun 18, 2026

odesenfans reopened this Jun 18, 2026

odesenfans changed the base branch from od/grpc-only-supervisor to dev June 18, 2026 00:24

ci: trigger CI for #981 (base now dev; carries #980 fixes)

b13a003

foxpatch-aleph reviewed Jun 18, 2026

View reviewed changes

Merge dev (Phase 1 #980 squashed) into Phase 2

f151efd

dev advanced only by the #980 squash, whose content phase2 already carries via the original commits. -s ours keeps phase2's tree byte-for-byte and records dev as a parent so #981 presents a clean Phase-2-only diff against dev.

foxpatch-aleph approved these changes Jun 18, 2026

View reviewed changes

ci(deb): also dump aleph-vm-agent journal on droplet-test failure

91ca399

The HTTP API (port 4020, /control/*) is served by the agent in the two-service split; the failure handler only dumped the supervisor unit, leaving the agent side (where /control/allocations 503s) invisible. Capture both.

foxpatch-aleph reviewed Jun 18, 2026

View reviewed changes

ci(deb): dump aleph-vm-agent journal in the post-test log export

7233a56

The after-failure export only captured the supervisor unit; the agent (HTTP API, /control/allocations) journal is where instance-create errors surface in the two-service split.

foxpatch-aleph reviewed Jun 18, 2026

View reviewed changes

foxpatch-aleph approved these changes Jun 18, 2026

View reviewed changes

foxpatch-aleph reviewed Jun 18, 2026

View reviewed changes

foxpatch-aleph approved these changes Jun 18, 2026

View reviewed changes

odesenfans merged commit db13f89 into dev Jun 18, 2026
19 checks passed

odesenfans deleted the od/grpc-only-supervisor-phase2 branch June 18, 2026 12:05

odesenfans mentioned this pull request Jun 18, 2026

fix(confidential): register agent record early, wait for awaiting-init #985

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(supervisor): Phase 2 - gRPC wire protocol + two-service packaging#981

feat(supervisor): Phase 2 - gRPC wire protocol + two-service packaging#981
odesenfans merged 88 commits into
devfrom
od/grpc-only-supervisor-phase2

odesenfans commented Jun 17, 2026

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

odesenfans commented Jun 17, 2026

Phase 2: complete the gRPC wire protocol + two-service packaging

Part A — complete the wire protocol

Part B — two-service packaging

Part C — confinement + validation

Testing

Known follow-ups (out of Phase 2 scope)

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 18, 2026 •

edited

Loading