Skip to content

refactor(supervisor): drop dead in-process QEMU cloud-init path#984

Merged
odesenfans merged 1 commit into
devfrom
od/drop-legacy-qemu-cloudinit
Jun 18, 2026
Merged

refactor(supervisor): drop dead in-process QEMU cloud-init path#984
odesenfans merged 1 commit into
devfrom
od/drop-legacy-qemu-cloudinit

Conversation

@odesenfans

Copy link
Copy Markdown
Contributor

What

Removes the now-dead in-process QEMU cloud-init configure() path that still read message_content, which is None on the message-free spec path (the only path used in production after the gRPC-only-supervisor refactor, #980#983).

Why

Production QEMU instance creation goes through VmPool.create_vm_from_spec, which builds cloud-init from the spec via build_qemu_configurationbuild_cloud_init_drive(ssh_authorized_keys=spec.ssh_authorized_keys) and starts with start(write_config=False). The in-process AlephQemuInstance.configure()CloudInitMixin._create_cloud_init_drive path (which reads resources.message_content) is only reachable via start(write_config=True), which production QEMU never uses. Left in place it crashes with AttributeError: 'NoneType' object has no attribute 'authorized_keys' if ever hit on the spec path (this is what tripped test_qemu_instance until #983 pointed it at the production flow).

Changes (−193 lines)

  • Delete CloudInitMixin._create_cloud_init_drive + the now-empty CloudInitMixin.
  • Delete AlephQemuInstance.configure() / AlephQemuConfidentialInstance.configure() and their dead save_controller_configuration methods/attrs.
  • Trim imports that become unused.

Kept (verified live)

  • Spec-based cloud-init in supervisor/qemu_build.py (build_cloud_init_drive, create_cloud_init_drive_image).
  • AlephQemuResources.message_content + download_* — still used by translate.build_create_vm_spec.
  • Firecracker configure() (separate class, reached via the live start(write_config=True) FC path).

Verification

  • mypy: no new errors (baseline-only).
  • tests/supervisor/ + tests/migration/: pass except known environment-only failures (test_interfaces pyroute2-needs-root, test_log journald); test_qemu_instance stays at its 2 expected XFAILs (no fake disk).

Pure dead-code removal; no functional change to production.

…path

Production QEMU instance creation goes through VmPool.create_vm_from_spec,
which builds the controller config with build_qemu_configuration /
build_qemu_confidential_configuration (message-free, from the CreateVmSpec),
writes it via save_controller_configuration, and calls
execution.start(write_config=False). With write_config=False,
VmExecution.start never calls vm.configure(), so the QEMU controllers'
in-process configure() methods were dead.

Those configure() methods called CloudInitMixin._create_cloud_init_drive,
which reads self.resources.message_content.authorized_keys and
.environment.trusted_execution. On the spec path message_content is None
(AlephQemuResources.from_spec sets it to None), so this path would crash
with AttributeError if it were ever reached.

Removed:
- CloudInitMixin._create_cloud_init_drive (and the CloudInitMixin class,
  its only member) in controllers/qemu/cloudinit.py
- AlephQemuInstance.configure() and its dead save_controller_configuration
  method / controller_configuration attribute
- AlephQemuConfidentialInstance.configure() and its controller_configuration
  attribute
- newly-unused imports across the three files

Kept (still live):
- module-level create_cloud_init_drive_image / encode_user_data /
  create_network_file / create_metadata_file / get_hostname_from_hash in
  cloudinit.py, used by supervisor/qemu_build.py and translate.py
- AlephQemuResources.message_content and the download_* methods, still used
  by supervisor/translate.build_create_vm_spec to resolve a message into a
  spec
- qmp_socket_path / qga_socket_path properties, used by controllers/qemu/client.py

No functional change to the live spec path or to Firecracker (whose own
configure() on the executable path stays, reached via start() with the
default write_config=True).

@foxpatch-aleph foxpatch-aleph left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure dead-code removal of the in-process QEMU cloud-init configure() path that was unreachable in production (production uses build_qemu_configuration + start(write_config=False)). All removed imports are confirmed unused in remaining code, all references to CloudInitMixin and _create_cloud_init_drive are gone from the source, and the Firecracker configure() path is untouched. The only minor nit is stale docstring references in qemu_build.py that mention the now-deleted configure() methods.

src/aleph/vm/supervisor/qemu_build.py (line 4): Docstring references AlephQemuInstance.configure() which no longer exists after this PR. Consider updating to note the original method was removed (e.g., "mirrors the original in-process configure() path, removed in #984"). Non-blocking — historical context only.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.00%. Comparing base (579fc81) to head (ad4a7f3).
⚠️ Report is 2 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #984      +/-   ##
==========================================
+ Coverage   78.83%   79.00%   +0.17%     
==========================================
  Files         202      202              
  Lines       22658    22586      -72     
  Branches     1462     1459       -3     
==========================================
- Hits        17862    17844      -18     
+ Misses       4410     4356      -54     
  Partials      386      386              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@odesenfans odesenfans merged commit 08af7b4 into dev Jun 18, 2026
18 of 19 checks passed
@odesenfans odesenfans deleted the od/drop-legacy-qemu-cloudinit branch June 18, 2026 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants