Skip to content

Race condition when downloading the same image on 2 nodes simultaneously with CephFS #16729

@stew3254

Description

@stew3254

Please confirm

  • I have searched existing issues to check if an issue already exists for the bug I encountered.

Distribution

Ubuntu

Distribution version

24.04

Output of "snap list --all lxd core20 core22 core24 snapd"

Name        Version                 Rev    Tracking       Publisher   Notes
core22      20250730                2111   latest/stable  canonical✓  base,disabled
core22      20250822                2133   latest/stable  canonical✓  base
core24      20250729                1151   latest/stable  canonical✓  base,disabled
core24      20250829                1196   latest/stable  canonical✓  base
lxd         5.21.4-8b5e998          35624  5.21/stable    canonical✓  in-cohort,held
microceph   19.2.1+snap30863f37a3   1518   squid/stable   canonical✓  disabled,in-cohort,held
microceph   19.2.1+snapac85f90153   1554   squid/stable   canonical✓  in-cohort,held
microcloud  2.1.1-202e275           1781   2/stable       canonical✓  in-cohort,held
microovn    24.03.2+snapa2c59c105b  667    24.03/stable   canonical✓  in-cohort,held
snapd       2.68.5                  24718  latest/stable  canonical✓  snapd,disabled
snapd       2.71                    25202  latest/stable  canonical✓  snapd

Output of "lxc info" or system info if it fails

config:
  cluster.https_address: 10.244.120.10:8443
  core.https_address: '[::]:8443'
  instances.migration.stateful: "true"
  network.ovn.northbound_connection: ssl:10.244.120.10:6641,ssl:10.244.120.11:6641,ssl:10.244.120.12:6641
  storage.backups_volume: local/backups-top
  storage.images_volume: remote-fs/images
  user.microcloud: 2.1.1
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
- image_restriction_nesting
- container_syscall_intercept_finit_module
- device_usb_serial
- network_allocate_external_ips
- explicit_trust_token
- instance_import_conversion
- instance_create_start
- devlxd_images_vm
- instance_protection_start
- disk_io_bus_virtio_blk
- metadata_configuration_entity_types
- network_allocations_ovn_uplink
- network_ovn_uplink_vlan
- shared_custom_block_volumes
- metrics_api_requests
- projects_limits_disk_pool
- access_management_tls
- state_logical_cpus
- vm_limits_cpu_pin_strategy
- gpu_cdi
- metadata_configuration_scope
- unix_device_hotplug_ownership_inherit
- unix_device_hotplug_subsystem_device_option
- storage_ceph_osd_pool_size
- network_get_target
- network_zones_all_projects
- vm_root_volume_attachment
- projects_limits_uplink_ips
- entities_with_entitlements
- profiles_all_projects
- storage_driver_powerflex
- storage_driver_pure
- cloud_init_ssh_keys
- oidc_scopes
- project_default_network_and_storage
- ubuntu_pro_guest_attach
- images_all_projects
- client_cert_presence
- resources_device_fs_uuid
- clustering_groups_used_by
- container_bpf_delegation
- override_snapshot_profiles_on_copy
- backup_metadata_version
- storage_buckets_all_projects
- network_acls_all_projects
- networks_all_projects
- clustering_restore_skip_mode
- disk_io_threads_virtiofsd
- oidc_client_secret
- pci_hotplug
- device_patch_removal
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
client_certificate: false
auth_user_name: ubuntu
auth_user_method: unix
environment:
  addresses:
  - 10.244.120.10:8443
  architectures:
  - x86_64
  - i686
  backup_metadata_version_range:
  - 1
  - 2
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB2jCCAWCgAwIBAgIQPzrJIvE2BToKEN4ffkDVWjAKBggqhkjOPQQDAzAhMQww
    CgYDVQQKEwNMWEQxETAPBgNVBAMMCHJvb3RAdG9wMB4XDTI1MDkwOTIxMjg0NloX
    DTM1MDkwNzIxMjg0NlowITEMMAoGA1UEChMDTFhEMREwDwYDVQQDDAhyb290QHRv
    cDB2MBAGByqGSM49AgEGBSuBBAAiA2IABE4BoD8D45HKhZHEPKml/pIhCgzBq/LC
    kOPIjOg952V/KL5PQ2rVQKfhjmJCzz7MBE3mbLHZrJQ+jejMd9BdKuVzg0KdI6XE
    2PlRtWP4y8Qo91CXIGY7eg0NYbmQTfZOG6NdMFswDgYDVR0PAQH/BAQDAgWgMBMG
    A1UdJQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwJgYDVR0RBB8wHYIDdG9w
    hwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMQCUfxzw
    l1LG1JyTBI2ywfSq1uCFaP0OzSziyqXXskBQmYquswSyC2U/gr+s/R1tbuICMGS1
    0NuEokBlxeaXlWzxMmuGn+o3clx8/j08faUE9w7P1YbXsBqZWaOuwon5jXJ84A==
    -----END CERTIFICATE-----
  certificate_fingerprint: ed829a0b6b9f1e9506c5fd8ab1e8c72a4bd13979f468342420aa61997a22e070
  driver: lxc | qemu
  driver_version: 6.0.4 | 8.2.2
  instance_types:
  - container
  - virtual-machine
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    bpf_token: "false"
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.8.0-85-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "24.04"
  project: foo
  server: lxd
  server_clustered: true
  server_event_mode: full-mesh
  server_name: top
  server_pid: 2956
  server_version: 5.21.4
  server_lts: true
  storage: cephfs | zfs | ceph
  storage_version: 17.2.7 | 2.2.2-0ubuntu9.4 | 17.2.7
  storage_supported_drivers:
  - name: cephfs
    version: 17.2.7
    remote: true
  - name: cephobject
    version: 17.2.7
    remote: true
  - name: pure
    version: 1.16 (nvme-cli)
    remote: true
  - name: zfs
    version: 2.2.2-0ubuntu9.4
    remote: false
  - name: btrfs
    version: 5.16.2
    remote: false
  - name: ceph
    version: 17.2.7
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.48.0
    remote: false
  - name: powerflex
    version: 1.16 (nvme-cli)
    remote: true

Issue description

When using a LXD cluster with a CephFS volume set as the storage.images_volume, LXD is able to share images between nodes. This works well when only one node in the cluster at a time tries to download the image, and then all subsequent launches should be quick.

However, if two or more nodes were to try to download the image at the same time, it would brick the image on all nodes that were later to download it than the first one. The first one will be able to launch instances from that image, but all other nodes will fail with something like this:

ubuntu@bottom:~$ lxc launch ubuntu:j -e --target bottom
Launching the instance
Error: Failed instance creation: Failed transferring image "8ddf83176547f1d337a708043d9314ed3897beb7508476d89650be2d7686306c" from "10.244.120.11:8443": open /var/snap/lxd/common/lxd/storage-
pools/remote-fs/custom/default_images/8ddf83176547f1d337a708043d9314ed3897beb7508476d89650be2d7686306c: no such file or directory

Interestingly, the ip address seen here is the ip from the working node. Also interestingly, when running the following to look into the snap's namespace on any of the nodes, none of the nodes see the image file as present, yet LXD can somehow run it on that one node, even after subsequent reboots.

sudo LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ nsenter --mount=/run/snapd/ns/lxd.mnt --
# The following command will be empty if no other images are already installed
$ ls /var/snap/lxd/common/lxd/storage-pools/remote-fs/custom/default_images

This issue can be worked around by removing the image from LXD lxc image rm <img> and then importing it properly without a race condition.

Steps to reproduce

  1. Set up a LXD cluster with CephFS as a remote. A standard MicroCloud cluster init command is fine using a CephFS.
  2. Create a volume in the default project: lxc storage volume create remote-fs images
  3. On each node in the cluster, set the image volume to the CephFS volume: lxc config set storage.images_volume=remote-fs/images
  4. Now, on two or mode nodes, at the same time run lxc launch ubuntu:n -e or something similar. I personally used tmux to ssh into each node and used the mirror mode to make that easy for myself. Just being fairly quick or picking a really big image that gives you ample time to create a race condition is enough
  5. Now, you'll see the later nodes fail, and you can no longer launch instances of that image type on any node except the first one to succeed

Information to attach

  • Any relevant kernel output (dmesg)
  • Instance log (lxc info NAME --show-log)
  • Instance configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (or use lxc monitor while reproducing the issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions