Skip to content

Creating container from big OCI image fails with cryptic error message due to low default volume size #3303

@ranfdev

Description

@ranfdev

Is there an existing issue for this?

  • There is no existing issue for this bug

Is this happening on an up to date version of Incus?

  • This is happening on a supported version of Incus

Incus system details

config:
  core.https_address: 0.0.0.0:8443
api_extensions:
  - storage_zfs_remove_snapshots
  - container_host_shutdown_timeout
  - container_stop_priority
  - container_syscall_filtering
  - auth_pki
  - container_last_used_at
  - etag
  - patch
  - usb_devices
  - https_allowed_credentials
  - image_compression_algorithm
  - directory_manipulation
  - container_cpu_time
  - storage_zfs_use_refquota
  - storage_lvm_mount_options
  - network
  - profile_usedby
  - container_push
  - container_exec_recording
  - certificate_update
  - container_exec_signal_handling
  - gpu_devices
  - container_image_properties
  - migration_progress
  - id_map
  - network_firewall_filtering
  - network_routes
  - storage
  - file_delete
  - file_append
  - network_dhcp_expiry
  - storage_lvm_vg_rename
  - storage_lvm_thinpool_rename
  - network_vlan
  - image_create_aliases
  - container_stateless_copy
  - container_only_migration
  - storage_zfs_clone_copy
  - unix_device_rename
  - storage_lvm_use_thinpool
  - storage_rsync_bwlimit
  - network_vxlan_interface
  - storage_btrfs_mount_options
  - entity_description
  - image_force_refresh
  - storage_lvm_lv_resizing
  - id_map_base
  - file_symlinks
  - container_push_target
  - network_vlan_physical
  - storage_images_delete
  - container_edit_metadata
  - container_snapshot_stateful_migration
  - storage_driver_ceph
  - storage_ceph_user_name
  - resource_limits
  - storage_volatile_initial_source
  - storage_ceph_force_osd_reuse
  - storage_block_filesystem_btrfs
  - resources
  - kernel_limits
  - storage_api_volume_rename
  - network_sriov
  - console
  - restrict_dev_incus
  - migration_pre_copy
  - infiniband
  - dev_incus_events
  - proxy
  - network_dhcp_gateway
  - file_get_symlink
  - network_leases
  - unix_device_hotplug
  - storage_api_local_volume_handling
  - operation_description
  - clustering
  - event_lifecycle
  - storage_api_remote_volume_handling
  - nvidia_runtime
  - container_mount_propagation
  - container_backup
  - dev_incus_images
  - container_local_cross_pool_handling
  - proxy_unix
  - proxy_udp
  - clustering_join
  - proxy_tcp_udp_multi_port_handling
  - network_state
  - proxy_unix_dac_properties
  - container_protection_delete
  - unix_priv_drop
  - pprof_http
  - proxy_haproxy_protocol
  - network_hwaddr
  - proxy_nat
  - network_nat_order
  - container_full
  - backup_compression
  - nvidia_runtime_config
  - storage_api_volume_snapshots
  - storage_unmapped
  - projects
  - network_vxlan_ttl
  - container_incremental_copy
  - usb_optional_vendorid
  - snapshot_scheduling
  - snapshot_schedule_aliases
  - container_copy_project
  - clustering_server_address
  - clustering_image_replication
  - container_protection_shift
  - snapshot_expiry
  - container_backup_override_pool
  - snapshot_expiry_creation
  - network_leases_location
  - resources_cpu_socket
  - resources_gpu
  - resources_numa
  - kernel_features
  - id_map_current
  - event_location
  - storage_api_remote_volume_snapshots
  - network_nat_address
  - container_nic_routes
  - cluster_internal_copy
  - seccomp_notify
  - lxc_features
  - container_nic_ipvlan
  - network_vlan_sriov
  - storage_cephfs
  - container_nic_ipfilter
  - resources_v2
  - container_exec_user_group_cwd
  - container_syscall_intercept
  - container_disk_shift
  - storage_shifted
  - resources_infiniband
  - daemon_storage
  - instances
  - image_types
  - resources_disk_sata
  - clustering_roles
  - images_expiry
  - resources_network_firmware
  - backup_compression_algorithm
  - ceph_data_pool_name
  - container_syscall_intercept_mount
  - compression_squashfs
  - container_raw_mount
  - container_nic_routed
  - container_syscall_intercept_mount_fuse
  - container_disk_ceph
  - virtual-machines
  - image_profiles
  - clustering_architecture
  - resources_disk_id
  - storage_lvm_stripes
  - vm_boot_priority
  - unix_hotplug_devices
  - api_filtering
  - instance_nic_network
  - clustering_sizing
  - firewall_driver
  - projects_limits
  - container_syscall_intercept_hugetlbfs
  - limits_hugepages
  - container_nic_routed_gateway
  - projects_restrictions
  - custom_volume_snapshot_expiry
  - volume_snapshot_scheduling
  - trust_ca_certificates
  - snapshot_disk_usage
  - clustering_edit_roles
  - container_nic_routed_host_address
  - container_nic_ipvlan_gateway
  - resources_usb_pci
  - resources_cpu_threads_numa
  - resources_cpu_core_die
  - api_os
  - container_nic_routed_host_table
  - container_nic_ipvlan_host_table
  - container_nic_ipvlan_mode
  - resources_system
  - images_push_relay
  - network_dns_search
  - container_nic_routed_limits
  - instance_nic_bridged_vlan
  - network_state_bond_bridge
  - usedby_consistency
  - custom_block_volumes
  - clustering_failure_domains
  - resources_gpu_mdev
  - console_vga_type
  - projects_limits_disk
  - network_type_macvlan
  - network_type_sriov
  - container_syscall_intercept_bpf_devices
  - network_type_ovn
  - projects_networks
  - projects_networks_restricted_uplinks
  - custom_volume_backup
  - backup_override_name
  - storage_rsync_compression
  - network_type_physical
  - network_ovn_external_subnets
  - network_ovn_nat
  - network_ovn_external_routes_remove
  - tpm_device_type
  - storage_zfs_clone_copy_rebase
  - gpu_mdev
  - resources_pci_iommu
  - resources_network_usb
  - resources_disk_address
  - network_physical_ovn_ingress_mode
  - network_ovn_dhcp
  - network_physical_routes_anycast
  - projects_limits_instances
  - network_state_vlan
  - instance_nic_bridged_port_isolation
  - instance_bulk_state_change
  - network_gvrp
  - instance_pool_move
  - gpu_sriov
  - pci_device_type
  - storage_volume_state
  - network_acl
  - migration_stateful
  - disk_state_quota
  - storage_ceph_features
  - projects_compression
  - projects_images_remote_cache_expiry
  - certificate_project
  - network_ovn_acl
  - projects_images_auto_update
  - projects_restricted_cluster_target
  - images_default_architecture
  - network_ovn_acl_defaults
  - gpu_mig
  - project_usage
  - network_bridge_acl
  - warnings
  - projects_restricted_backups_and_snapshots
  - clustering_join_token
  - clustering_description
  - server_trusted_proxy
  - clustering_update_cert
  - storage_api_project
  - server_instance_driver_operational
  - server_supported_storage_drivers
  - event_lifecycle_requestor_address
  - resources_gpu_usb
  - clustering_evacuation
  - network_ovn_nat_address
  - network_bgp
  - network_forward
  - custom_volume_refresh
  - network_counters_errors_dropped
  - metrics
  - image_source_project
  - clustering_config
  - network_peer
  - linux_sysctl
  - network_dns
  - ovn_nic_acceleration
  - certificate_self_renewal
  - instance_project_move
  - storage_volume_project_move
  - cloud_init
  - network_dns_nat
  - database_leader
  - instance_all_projects
  - clustering_groups
  - ceph_rbd_du
  - instance_get_full
  - qemu_metrics
  - gpu_mig_uuid
  - event_project
  - clustering_evacuation_live
  - instance_allow_inconsistent_copy
  - network_state_ovn
  - storage_volume_api_filtering
  - image_restrictions
  - storage_zfs_export
  - network_dns_records
  - storage_zfs_reserve_space
  - network_acl_log
  - storage_zfs_blocksize
  - metrics_cpu_seconds
  - instance_snapshot_never
  - certificate_token
  - instance_nic_routed_neighbor_probe
  - event_hub
  - agent_nic_config
  - projects_restricted_intercept
  - metrics_authentication
  - images_target_project
  - images_all_projects
  - cluster_migration_inconsistent_copy
  - cluster_ovn_chassis
  - container_syscall_intercept_sched_setscheduler
  - storage_lvm_thinpool_metadata_size
  - storage_volume_state_total
  - instance_file_head
  - instances_nic_host_name
  - image_copy_profile
  - container_syscall_intercept_sysinfo
  - clustering_evacuation_mode
  - resources_pci_vpd
  - qemu_raw_conf
  - storage_cephfs_fscache
  - network_load_balancer
  - vsock_api
  - instance_ready_state
  - network_bgp_holdtime
  - storage_volumes_all_projects
  - metrics_memory_oom_total
  - storage_buckets
  - storage_buckets_create_credentials
  - metrics_cpu_effective_total
  - projects_networks_restricted_access
  - storage_buckets_local
  - loki
  - acme
  - internal_metrics
  - cluster_join_token_expiry
  - remote_token_expiry
  - init_preseed
  - storage_volumes_created_at
  - cpu_hotplug
  - projects_networks_zones
  - network_txqueuelen
  - cluster_member_state
  - instances_placement_scriptlet
  - storage_pool_source_wipe
  - zfs_block_mode
  - instance_generation_id
  - disk_io_cache
  - amd_sev
  - storage_pool_loop_resize
  - migration_vm_live
  - ovn_nic_nesting
  - oidc
  - network_ovn_l3only
  - ovn_nic_acceleration_vdpa
  - cluster_healing
  - instances_state_total
  - auth_user
  - security_csm
  - instances_rebuild
  - numa_cpu_placement
  - custom_volume_iso
  - network_allocations
  - zfs_delegate
  - storage_api_remote_volume_snapshot_copy
  - operations_get_query_all_projects
  - metadata_configuration
  - syslog_socket
  - event_lifecycle_name_and_project
  - instances_nic_limits_priority
  - disk_initial_volume_configuration
  - operation_wait
  - image_restriction_privileged
  - cluster_internal_custom_volume_copy
  - disk_io_bus
  - storage_cephfs_create_missing
  - instance_move_config
  - ovn_ssl_config
  - certificate_description
  - disk_io_bus_virtio_blk
  - loki_config_instance
  - instance_create_start
  - clustering_evacuation_stop_options
  - boot_host_shutdown_action
  - agent_config_drive
  - network_state_ovn_lr
  - image_template_permissions
  - storage_bucket_backup
  - storage_lvm_cluster
  - shared_custom_block_volumes
  - auth_tls_jwt
  - oidc_claim
  - device_usb_serial
  - numa_cpu_balanced
  - image_restriction_nesting
  - network_integrations
  - instance_memory_swap_bytes
  - network_bridge_external_create
  - network_zones_all_projects
  - storage_zfs_vdev
  - container_migration_stateful
  - profiles_all_projects
  - instances_scriptlet_get_instances
  - instances_scriptlet_get_cluster_members
  - instances_scriptlet_get_project
  - network_acl_stateless
  - instance_state_started_at
  - networks_all_projects
  - network_acls_all_projects
  - storage_buckets_all_projects
  - resources_load
  - instance_access
  - project_access
  - projects_force_delete
  - resources_cpu_flags
  - disk_io_bus_cache_filesystem
  - instance_oci
  - clustering_groups_config
  - instances_lxcfs_per_instance
  - clustering_groups_vm_cpu_definition
  - disk_volume_subpath
  - projects_limits_disk_pool
  - network_ovn_isolated
  - qemu_raw_qmp
  - network_load_balancer_health_check
  - oidc_scopes
  - network_integrations_peer_name
  - qemu_scriptlet
  - instance_auto_restart
  - storage_lvm_metadatasize
  - ovn_nic_promiscuous
  - ovn_nic_ip_address_none
  - instances_state_os_info
  - network_load_balancer_state
  - instance_nic_macvlan_mode
  - storage_lvm_cluster_create
  - network_ovn_external_interfaces
  - instances_scriptlet_get_instances_count
  - cluster_rebalance
  - custom_volume_refresh_exclude_older_snapshots
  - storage_initial_owner
  - storage_live_migration
  - instance_console_screenshot
  - image_import_alias
  - authorization_scriptlet
  - console_force
  - network_ovn_state_addresses
  - network_bridge_acl_devices
  - instance_debug_memory
  - init_preseed_storage_volumes
  - init_preseed_profile_project
  - instance_nic_routed_host_address
  - instance_smbios11
  - api_filtering_extended
  - acme_dns01
  - security_iommu
  - network_ipv4_dhcp_routes
  - network_state_ovn_ls
  - network_dns_nameservers
  - acme_http01_port
  - network_ovn_ipv4_dhcp_expiry
  - instance_state_cpu_time
  - network_io_bus
  - disk_io_bus_usb
  - storage_driver_linstor
  - instance_oci_entrypoint
  - network_address_set
  - server_logging
  - network_forward_snat
  - memory_hotplug
  - instance_nic_routed_host_tables
  - instance_publish_split
  - init_preseed_certificates
  - custom_volume_sftp
  - network_ovn_external_nic_address
  - network_physical_gateway_hwaddr
  - backup_s3_upload
  - snapshot_manual_expiry
  - resources_cpu_address_sizes
  - disk_attached
  - limits_memory_hotplug
  - disk_wwn
  - server_logging_webhook
  - storage_driver_truenas
  - container_disk_tmpfs
  - instance_limits_oom
  - backup_override_config
  - network_ovn_tunnels
  - init_preseed_cluster_groups
  - usb_attached
  - backup_iso
  - instance_systemd_credentials
  - cluster_group_usedby
  - bpf_token_delegation
  - file_storage_volume
  - network_hwaddr_pattern
  - storage_volume_full
  - storage_bucket_full
  - device_pci_firmware
  - resources_serial
  - ovn_nic_limits
  - storage_lvmcluster_qcow2
  - oidc_allowed_subnets
  - file_delete_force
  - nic_sriov_select_ext
  - network_zones_dns_contact
  - nic_attached_connected
  - nic_sriov_security_trusted
  - direct_backup
  - instance_snapshot_disk_only_restore
  - unix_hotplug_pci
  - cluster_evacuating_restoring
  - projects_restricted_image_servers
  - storage_lvmcluster_size
  - authorization_scriptlet_cert
  - lvmcluster_remove_snapshots
  - daemon_storage_logs
  - instances_debug_repair
  - network_io_bus_ovn
  - dependent
  - metrics_project_resources
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
  - tls
auth_user_name: lollo
auth_user_method: unix
environment:
  addresses:
    - 192.168.1.31:8443
    - 192.168.122.1:8443
    - 100.126.119.58:8443
    - 172.17.0.1:8443
    - 10.183.2.1:8443
  architectures:
    - x86_64
    - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB+DCCAX6gAwIBAgIRANMmZgBUcVxzdi6uyrrZ6f0wCgYIKoZIzj0EAwMwLzEZ
    MBcGA1UEChMQTGludXggQ29udGFpbmVyczESMBAGA1UEAwwJcm9vdEByb2NrMB4X
    DTI2MDIxOTEwNTA1OVoXDTM2MDIxNzEwNTA1OVowLzEZMBcGA1UEChMQTGludXgg
    Q29udGFpbmVyczESMBAGA1UEAwwJcm9vdEByb2NrMHYwEAYHKoZIzj0CAQYFK4EE
    ACIDYgAEPM71HNX37s16etMzjRh0agf6nSVTXkzX99e1t2P7LZHuIraXa7aoD5Yf
    +1cOr0IaCXUwv+Hy0OHtHNfA1VDp4Mr8UDJDMmJy1wtYPcDJS+kZzU6n72npUXC/
    Y8YUviLoo14wXDAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEw
    DAYDVR0TAQH/BAIwADAnBgNVHREEIDAeggRyb2NrhwR/AAABhxAAAAAAAAAAAAAA
    AAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMQDF8KVqAAkVHbueqj7f7tiWuZrUVaeu
    /leRfuDE3ITh3shdjKzTs/iBuNKQRiDQkQMCMCxxDhc8tKyxO8Wu2QjHZau0pkdu
    +itHtwllzJhepMrdcgyv3K11vvZhWxfj7GRwEw==
    -----END CERTIFICATE-----
  certificate_fingerprint: 15e49aa6abb7a084a2be3818019ccab2743d99de7e5164b02f0a9e4e05c6c681
  driver: lxc | qemu
  driver_version: 6.0.6 | 10.2.2
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.17.0-23-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: KDE neon
  os_version: "24.04"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: rock
  server_pid: 1737
  server_version: "6.23"
  storage: btrfs | lvm
  storage_version: 6.6.3 | 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.50.0
  storage_supported_drivers:
    - name: btrfs
      version: 6.6.3
      remote: false
    - name: dir
      version: "1"
      remote: false
    - name: lvm
      version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.50.0
      remote: false
    - name: truenas
      version: 0.7.7
      remote: true

Instance details

No response

Instance log

No response

Current behavior

(I see incus 7.0 has just been released, but it's not on the zabbly repos yet...)

I was trying to launch an instance of a big OCI image:
incus launch nvcr:nvidia/tensorflow:25.02-tf2-py3 --console --ephemeral --storage pool3

It kept failing. I couldn't understand why. I freed up some space on disk. Still failed. I then increased the root disk size; same problem.

Apparently it was failing during the unpacking phase, because the tensorflow image is ~9GiB, but the default volume size is 10GiB.

I fixed the issue with incus storage set pool3 volume.size=50GiB

Log:

 Failed creating instance from image: Unpack failed: Failed to run: tar

--anchored --wildcards --exclude=dev/* --exclude=/dev/*

--exclude=./dev/* --exclude=rootfs/dev/* --exclude=/rootfs/dev/*

--exclude=./rootfs/dev/* --restrict --force-local -C

/var/lib/incus/storage-pools/pool3/images/c70a6ebfc7a6f8b10771e0e1ceac76bbe40ce125bc052baf52266f8a997542c8/rootfs

--numeric-owner --xattrs-include=* -zxf -: exit status 2 (tar:

./usr/local/lib/python3.12/dist-packages/nvidia/dali/.libs/libcvcuda-1823efd9.so.0:

Wrote only 9216 of 10240 bytes

tar:

./usr/local/lib/python3.12/dist-packages/nvidia/dali/.libs/libaws-crt-cpp-c845276f.so:

Cannot write: No space left on device

tar:

./usr/local/lib/python3.12/dist-packages/nvidia/dali/.libs/libavutil-edf718f0.so.59:

Cannot write: No... 

Expected behavior

Incus could provide a better error message. The problem is not that I don't have disk space. It isn't even that the root disk size of the container is too small. The problem is that the volume used to unpack the rootfs, before creating the root disk, is too small.

I would expect incus to write what device is missing space. Is it the intermediate volume used to unpack the image? Is it the root disk of the container? Is it the pool?

Also, I think that currently unpacking an image bigger than 10GiB, in a pool where volume.size=10GiB, would fail every time with the same cryptic error message from tar.

Changing the default volume.size in the pool indeed solves the problem, I just think the error message can be improve to guide the user on how to solve this.

Steps to reproduce

  • Have a pool where the default volume size is <= 10GiB
  • incus launch nvcr:nvidia/tensorflow:25.02-tf2-py3 --console --ephemeral --storage pool
  • See unpacking fail with cryptic tar error message

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions