-
Notifications
You must be signed in to change notification settings - Fork 964
Description
Please confirm
- I have searched existing issues to check if an issue already exists for the bug I encountered.
Distribution
Ubuntu
Distribution version
24.04
Output of "snap list --all lxd core20 core22 core24 snapd"
Name Version Rev Tracking Publisher Notes
core22 20250730 2111 latest/stable canonical✓ base,disabled
core22 20250822 2133 latest/stable canonical✓ base
core24 20250729 1151 latest/stable canonical✓ base,disabled
core24 20250829 1196 latest/stable canonical✓ base
lxd 5.21.4-8b5e998 35624 5.21/stable canonical✓ in-cohort,held
microceph 19.2.1+snap30863f37a3 1518 squid/stable canonical✓ disabled,in-cohort,held
microceph 19.2.1+snapac85f90153 1554 squid/stable canonical✓ in-cohort,held
microcloud 2.1.1-202e275 1781 2/stable canonical✓ in-cohort,held
microovn 24.03.2+snapa2c59c105b 667 24.03/stable canonical✓ in-cohort,held
snapd 2.68.5 24718 latest/stable canonical✓ snapd,disabled
snapd 2.71 25202 latest/stable canonical✓ snapdOutput of "lxc info" or system info if it fails
config:
cluster.https_address: 10.244.120.10:8443
core.https_address: '[::]:8443'
instances.migration.stateful: "true"
network.ovn.northbound_connection: ssl:10.244.120.10:6641,ssl:10.244.120.11:6641,ssl:10.244.120.12:6641
storage.backups_volume: local/backups-top
storage.images_volume: remote-fs/images
user.microcloud: 2.1.1
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
- image_restriction_nesting
- container_syscall_intercept_finit_module
- device_usb_serial
- network_allocate_external_ips
- explicit_trust_token
- instance_import_conversion
- instance_create_start
- devlxd_images_vm
- instance_protection_start
- disk_io_bus_virtio_blk
- metadata_configuration_entity_types
- network_allocations_ovn_uplink
- network_ovn_uplink_vlan
- shared_custom_block_volumes
- metrics_api_requests
- projects_limits_disk_pool
- access_management_tls
- state_logical_cpus
- vm_limits_cpu_pin_strategy
- gpu_cdi
- metadata_configuration_scope
- unix_device_hotplug_ownership_inherit
- unix_device_hotplug_subsystem_device_option
- storage_ceph_osd_pool_size
- network_get_target
- network_zones_all_projects
- vm_root_volume_attachment
- projects_limits_uplink_ips
- entities_with_entitlements
- profiles_all_projects
- storage_driver_powerflex
- storage_driver_pure
- cloud_init_ssh_keys
- oidc_scopes
- project_default_network_and_storage
- ubuntu_pro_guest_attach
- images_all_projects
- client_cert_presence
- resources_device_fs_uuid
- clustering_groups_used_by
- container_bpf_delegation
- override_snapshot_profiles_on_copy
- backup_metadata_version
- storage_buckets_all_projects
- network_acls_all_projects
- networks_all_projects
- clustering_restore_skip_mode
- disk_io_threads_virtiofsd
- oidc_client_secret
- pci_hotplug
- device_patch_removal
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
client_certificate: false
auth_user_name: ubuntu
auth_user_method: unix
environment:
addresses:
- 10.244.120.10:8443
architectures:
- x86_64
- i686
backup_metadata_version_range:
- 1
- 2
certificate: |
-----BEGIN CERTIFICATE-----
MIIB2jCCAWCgAwIBAgIQPzrJIvE2BToKEN4ffkDVWjAKBggqhkjOPQQDAzAhMQww
CgYDVQQKEwNMWEQxETAPBgNVBAMMCHJvb3RAdG9wMB4XDTI1MDkwOTIxMjg0NloX
DTM1MDkwNzIxMjg0NlowITEMMAoGA1UEChMDTFhEMREwDwYDVQQDDAhyb290QHRv
cDB2MBAGByqGSM49AgEGBSuBBAAiA2IABE4BoD8D45HKhZHEPKml/pIhCgzBq/LC
kOPIjOg952V/KL5PQ2rVQKfhjmJCzz7MBE3mbLHZrJQ+jejMd9BdKuVzg0KdI6XE
2PlRtWP4y8Qo91CXIGY7eg0NYbmQTfZOG6NdMFswDgYDVR0PAQH/BAQDAgWgMBMG
A1UdJQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwJgYDVR0RBB8wHYIDdG9w
hwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMQCUfxzw
l1LG1JyTBI2ywfSq1uCFaP0OzSziyqXXskBQmYquswSyC2U/gr+s/R1tbuICMGS1
0NuEokBlxeaXlWzxMmuGn+o3clx8/j08faUE9w7P1YbXsBqZWaOuwon5jXJ84A==
-----END CERTIFICATE-----
certificate_fingerprint: ed829a0b6b9f1e9506c5fd8ab1e8c72a4bd13979f468342420aa61997a22e070
driver: lxc | qemu
driver_version: 6.0.4 | 8.2.2
instance_types:
- container
- virtual-machine
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
bpf_token: "false"
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
uevent_injection: "true"
unpriv_binfmt: "true"
unpriv_fscaps: "true"
kernel_version: 6.8.0-85-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "24.04"
project: foo
server: lxd
server_clustered: true
server_event_mode: full-mesh
server_name: top
server_pid: 2956
server_version: 5.21.4
server_lts: true
storage: cephfs | zfs | ceph
storage_version: 17.2.7 | 2.2.2-0ubuntu9.4 | 17.2.7
storage_supported_drivers:
- name: cephfs
version: 17.2.7
remote: true
- name: cephobject
version: 17.2.7
remote: true
- name: pure
version: 1.16 (nvme-cli)
remote: true
- name: zfs
version: 2.2.2-0ubuntu9.4
remote: false
- name: btrfs
version: 5.16.2
remote: false
- name: ceph
version: 17.2.7
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.48.0
remote: false
- name: powerflex
version: 1.16 (nvme-cli)
remote: trueIssue description
When using a LXD cluster with a CephFS volume set as the storage.images_volume, LXD is able to share images between nodes. This works well when only one node in the cluster at a time tries to download the image, and then all subsequent launches should be quick.
However, if two or more nodes were to try to download the image at the same time, it would brick the image on all nodes that were later to download it than the first one. The first one will be able to launch instances from that image, but all other nodes will fail with something like this:
ubuntu@bottom:~$ lxc launch ubuntu:j -e --target bottom
Launching the instance
Error: Failed instance creation: Failed transferring image "8ddf83176547f1d337a708043d9314ed3897beb7508476d89650be2d7686306c" from "10.244.120.11:8443": open /var/snap/lxd/common/lxd/storage-
pools/remote-fs/custom/default_images/8ddf83176547f1d337a708043d9314ed3897beb7508476d89650be2d7686306c: no such file or directoryInterestingly, the ip address seen here is the ip from the working node. Also interestingly, when running the following to look into the snap's namespace on any of the nodes, none of the nodes see the image file as present, yet LXD can somehow run it on that one node, even after subsequent reboots.
sudo LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ nsenter --mount=/run/snapd/ns/lxd.mnt --
# The following command will be empty if no other images are already installed
$ ls /var/snap/lxd/common/lxd/storage-pools/remote-fs/custom/default_imagesThis issue can be worked around by removing the image from LXD lxc image rm <img> and then importing it properly without a race condition.
Steps to reproduce
- Set up a LXD cluster with CephFS as a remote. A standard MicroCloud cluster init command is fine using a CephFS.
- Create a volume in the default project:
lxc storage volume create remote-fs images - On each node in the cluster, set the image volume to the CephFS volume:
lxc config set storage.images_volume=remote-fs/images - Now, on two or mode nodes, at the same time run
lxc launch ubuntu:n -eor something similar. I personally usedtmuxtosshinto each node and used the mirror mode to make that easy for myself. Just being fairly quick or picking a really big image that gives you ample time to create a race condition is enough - Now, you'll see the later nodes fail, and you can no longer launch instances of that image type on any node except the first one to succeed
Information to attach
- Any relevant kernel output (
dmesg) - Instance log (
lxc info NAME --show-log) - Instance configuration (
lxc config show NAME --expanded) - Main daemon log (at
/var/log/lxd/lxd.logor/var/snap/lxd/common/lxd/logs/lxd.log) - Output of the client with
--debug - Output of the daemon with
--debug(or uselxc monitorwhile reproducing the issue)