refactor(supervisor): drop dead in-process QEMU cloud-init path#984
Conversation
…path Production QEMU instance creation goes through VmPool.create_vm_from_spec, which builds the controller config with build_qemu_configuration / build_qemu_confidential_configuration (message-free, from the CreateVmSpec), writes it via save_controller_configuration, and calls execution.start(write_config=False). With write_config=False, VmExecution.start never calls vm.configure(), so the QEMU controllers' in-process configure() methods were dead. Those configure() methods called CloudInitMixin._create_cloud_init_drive, which reads self.resources.message_content.authorized_keys and .environment.trusted_execution. On the spec path message_content is None (AlephQemuResources.from_spec sets it to None), so this path would crash with AttributeError if it were ever reached. Removed: - CloudInitMixin._create_cloud_init_drive (and the CloudInitMixin class, its only member) in controllers/qemu/cloudinit.py - AlephQemuInstance.configure() and its dead save_controller_configuration method / controller_configuration attribute - AlephQemuConfidentialInstance.configure() and its controller_configuration attribute - newly-unused imports across the three files Kept (still live): - module-level create_cloud_init_drive_image / encode_user_data / create_network_file / create_metadata_file / get_hostname_from_hash in cloudinit.py, used by supervisor/qemu_build.py and translate.py - AlephQemuResources.message_content and the download_* methods, still used by supervisor/translate.build_create_vm_spec to resolve a message into a spec - qmp_socket_path / qga_socket_path properties, used by controllers/qemu/client.py No functional change to the live spec path or to Firecracker (whose own configure() on the executable path stays, reached via start() with the default write_config=True).
foxpatch-aleph
left a comment
There was a problem hiding this comment.
Pure dead-code removal of the in-process QEMU cloud-init configure() path that was unreachable in production (production uses build_qemu_configuration + start(write_config=False)). All removed imports are confirmed unused in remaining code, all references to CloudInitMixin and _create_cloud_init_drive are gone from the source, and the Firecracker configure() path is untouched. The only minor nit is stale docstring references in qemu_build.py that mention the now-deleted configure() methods.
src/aleph/vm/supervisor/qemu_build.py (line 4): Docstring references AlephQemuInstance.configure() which no longer exists after this PR. Consider updating to note the original method was removed (e.g., "mirrors the original in-process configure() path, removed in #984"). Non-blocking — historical context only.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #984 +/- ##
==========================================
+ Coverage 78.83% 79.00% +0.17%
==========================================
Files 202 202
Lines 22658 22586 -72
Branches 1462 1459 -3
==========================================
- Hits 17862 17844 -18
+ Misses 4410 4356 -54
Partials 386 386 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
What
Removes the now-dead in-process QEMU cloud-init
configure()path that still readmessage_content, which isNoneon the message-free spec path (the only path used in production after the gRPC-only-supervisor refactor, #980–#983).Why
Production QEMU instance creation goes through
VmPool.create_vm_from_spec, which builds cloud-init from the spec viabuild_qemu_configuration→build_cloud_init_drive(ssh_authorized_keys=spec.ssh_authorized_keys)and starts withstart(write_config=False). The in-processAlephQemuInstance.configure()→CloudInitMixin._create_cloud_init_drivepath (which readsresources.message_content) is only reachable viastart(write_config=True), which production QEMU never uses. Left in place it crashes withAttributeError: 'NoneType' object has no attribute 'authorized_keys'if ever hit on the spec path (this is what trippedtest_qemu_instanceuntil #983 pointed it at the production flow).Changes (−193 lines)
CloudInitMixin._create_cloud_init_drive+ the now-emptyCloudInitMixin.AlephQemuInstance.configure()/AlephQemuConfidentialInstance.configure()and their deadsave_controller_configurationmethods/attrs.Kept (verified live)
supervisor/qemu_build.py(build_cloud_init_drive,create_cloud_init_drive_image).AlephQemuResources.message_content+download_*— still used bytranslate.build_create_vm_spec.configure()(separate class, reached via the livestart(write_config=True)FC path).Verification
tests/supervisor/+tests/migration/: pass except known environment-only failures (test_interfacespyroute2-needs-root,test_logjournald);test_qemu_instancestays at its 2 expected XFAILs (no fake disk).Pure dead-code removal; no functional change to production.