Description
Required information
- Distribution: Gentoo
- Distribution version: --
- The output of "incus info":
config: {}
api_extensions:
[…]
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIIB+zCCAYGgAwIBAgIRAICBOYWaBvVjqdDlc8YUw/swCgYIKoZIzj0EAwMwMDEZ
MBcGA1UEChMQTGludXggQ29udGFpbmVyczETMBEGA1UEAwwKcm9vdEBwbHV0bzAe
Fw0yNDAxMTIxNTUzNTNaFw0zNDAxMDkxNTUzNTNaMDAxGTAXBgNVBAoTEExpbnV4
IENvbnRhaW5lcnMxEzARBgNVBAMMCnJvb3RAcGx1dG8wdjAQBgcqhkjOPQIBBgUr
gQQAIgNiAASHXIMZRZz07Fvdh1ehgSKNc3yk/fYbTPMGbOF3VyeDWVWEsjhHQFeW
rJ9EPYNjogBUba3NKVaEmMAz+nKDM4TcwwLytIUAFLFyIqaRnuxFHF6RXU/g7jGV
RzwqYxSzCcujXzBdMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcD
ATAMBgNVHRMBAf8EAjAAMCgGA1UdEQQhMB+CBXBsdXRvhwR/AAABhxAAAAAAAAAA
AAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMQD9KzSklLuzuifRoJvKuwCPmn9r
UyfCac0hEA5Kj57/hgJ0y/+n3ODpln3JuAdn95MCMBJ8b5/W6x6sxC2e+vso+lfD
LNTgFQ6Ha6FfWEKBTGPb0KPr3XICf5IMIWb6kI3OiA==
-----END CERTIFICATE-----
certificate_fingerprint: a75220cf363eb0c0ffd6427f5d9d88510d5d608d335514b76548b47e37bb4e69
driver: lxc | qemu
driver_version: 6.0.2 | 8.2.3
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
uevent_injection: "true"
unpriv_binfmt: "false"
unpriv_fscaps: "true"
kernel_version: 6.6.62-gentoo-dist
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Gentoo
os_version: "2.17"
project: default
server: incus
server_clustered: false
server_event_mode: full-mesh
server_name: pluto
server_pid: 1857954
server_version: "6.7"
storage: zfs
storage_version: 2.2.5-r0-gentoo
storage_supported_drivers:
- name: zfs
version: 2.2.5-r0-gentoo
remote: false
- name: dir
version: "1"
remote: false
Issue description
Here is another one on the same setup as in #1483
While running incus-benchmark launch --count 200 --parallel 11
(with --parallel
>= 11 on that 12 thread machine), some instances fail with this errors:
(sysadm_r)@pluto ~ # incus-benchmark launch --count 200 --parallel 11
Test environment:
Server backend: incus
Server version: 6.7
Kernel: Linux
Kernel tecture: x86_64
Kernel version: 6.6.62-gentoo-dist
Storage backend: zfs
Storage version: 2.2.5-r0-gentoo
Container backend: lxc | qemu
Container version: 6.0.2 | 8.2.3
Test variables:
Container count: 200
Container mode: unprivileged
Startup mode: normal startup
Image: images:ubuntu/22.04
Batches: 18
Batch size: 11
Remainder: 2
[Dec 13 09:37:00.962] Found image in local store: 3bf7665e786f9baf780a5774fae5b43f5a29c79d76af1474ff6e8f212fcfa25d
[Dec 13 09:37:00.962] Batch processing start
[Dec 13 09:37:58.761] Processed 11 containers in 57.798s (0.190/s)
[Dec 13 09:38:54.180] Processed 22 containers in 113.217s (0.194/s)
[Dec 13 09:40:51.015] Processed 44 containers in 230.053s (0.191/s)
[Dec 13 09:41:25.084] Failed to launch container 'benchmark-052': Failed creating instance from image: Failed to activate volume: Failed to locate zvol for "zdata/incus/containers/benchmark-052": context deadline exceeded
[Dec 13 09:41:25.099] Failed to launch container 'benchmark-055': Failed creating instance from image: Failed to activate volume: Failed to locate zvol for "zdata/incus/containers/benchmark-055": context deadline exceeded
[Dec 13 09:44:18.373] Processed 88 containers in 437.411s (0.201/s)
[Dec 13 09:44:53.504] Failed to launch container 'benchmark-093': Failed creating instance from image: Failed to activate volume: Failed to locate zvol for "zdata/incus/containers/benchmark-093": context deadline exceeded
[Dec 13 09:45:43.469] Failed to launch container 'benchmark-107': Failed creating instance from image: Failed to mount "/dev/zvol/zdata/incus/containers/benchmark-107" on "/var/lib/incus/storage-pools/default/containers/benchmark-107" using "ext4": no such file or directory
[Dec 13 09:51:44.892] Processed 176 containers in 883.929s (0.199/s)
[Dec 13 09:53:45.504] Batch processing completed in 1004.542s
(sysadm_r)@pluto ~ #
indeed 4 instances missing:
(sysadm_r)@pluto ~ # incus list|grep benchmark| wc -l
196
(sysadm_r)@pluto ~ #
The ZFS storage itself is fine and healthy. The mentioned filesets are missing in the pool. If I increase --parallel
further, I get more failing instances (both, "context deadline exceeded" and "no such file or directory")
Steps to reproduce
- Have zfs storage backend (here 2x nvme as system and zfs special device and 2x spinning rust for data)
- run above incus-benchmark command with
--parallel
matching (alomost) system cpu threads
Information to attach
I also see following errors, but also for instances that did not fail in the end:
Dez 13 09:43:25 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-086: Can't open blockdev
Dez 13 09:43:25 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-086: Can't lookup blockdev
Dez 13 09:43:26 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-086: Can't lookup blockdev
Dez 13 09:43:26 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-086: Can't lookup blockdev
Dez 13 09:43:27 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-086: Can't lookup blockdev
Dez 13 09:45:08 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't open blockdev
Dez 13 09:45:09 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:09 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:10 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:10 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:11 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:12 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:12 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:13 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:13 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:14 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:14 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:15 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:15 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:16 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:16 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:17 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:17 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:18 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:18 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-107: Can't lookup blockdev
Dez 13 09:45:36 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-110: Can't open blockdev
Dez 13 09:52:19 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-183: Can't open blockdev
Dez 13 09:52:34 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-195: Can't open blockdev
Dez 13 09:52:34 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-198: Can't open blockdev
Dez 13 09:52:34 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-195: Can't lookup blockdev
Dez 13 09:52:35 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-198: Can't lookup blockdev
Dez 13 09:52:35 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-195: Can't lookup blockdev
Dez 13 09:52:35 pluto kernel: /dev/zvol/zdata/incus/containers/benchmark-198: Can't lookup blockdev
To me this looks like load issues and incusd not waiting long enough for devices to appear? Because it eventually gave up on 107 (does not exist), but not for example on 086 (exists and is running)
Here is some of the the output of incus monitor --pretty
, but from another run.
benchmark-055 failed with "no such file or directory":
(sysadm_r)@pluto ~ # grep benchmark-055 incus-monitor.log
time="2024-12-13T10:35:54+01:00" level=info msg="ID: 7e31f71c-9194-40da-aaa8-eca2192195c2, Class: task, Description: Creating instance" CreatedAt="2024-12-13 10:35:54.267475495 +0100 CET" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/benchmark-055]]" Status=Pending StatusCode=Pending UpdatedAt="2024-12-13 10:35:54.267475495 +0100 CET"
time="2024-12-13T10:35:54+01:00" level=info msg="ID: 7e31f71c-9194-40da-aaa8-eca2192195c2, Class: task, Description: Creating instance" CreatedAt="2024-12-13 10:35:54.267475495 +0100 CET" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/benchmark-055]]" Status=Running StatusCode=Running UpdatedAt="2024-12-13 10:35:54.267475495 +0100 CET"
time="2024-12-13T10:35:54+01:00" level=debug msg="Instance operation lock created" action=create instance=benchmark-055 project=default reusable=false
time="2024-12-13T10:35:54+01:00" level=info msg="Creating instance" ephemeral=false instance=benchmark-055 instanceType=container project=default
time="2024-12-13T10:35:54+01:00" level=debug msg="Adding device" device=eth0 instance=benchmark-055 instanceType=container project=default type=nic
time="2024-12-13T10:35:54+01:00" level=info msg="Action: instance-created, Source: /1.0/instances/benchmark-055, Requestor: unix/root (@)" location=none storage-pool=default type=container
time="2024-12-13T10:35:54+01:00" level=info msg="Created instance" ephemeral=false instance=benchmark-055 instanceType=container project=default
time="2024-12-13T10:35:54+01:00" level=debug msg="Adding device" device=root instance=benchmark-055 instanceType=container project=default type=disk
time="2024-12-13T10:35:54+01:00" level=debug msg="CreateInstanceFromImage started" driver=zfs instance=benchmark-055 pool=default project=default
time="2024-12-13T10:35:56+01:00" level=debug msg="Activated ZFS volume" dev=zdata/incus/containers/benchmark-055 driver=zfs pool=default volName=benchmark-055
time="2024-12-13T10:36:14+01:00" level=debug msg="Deactivated ZFS volume" dev=zdata/incus/containers/benchmark-055 driver=zfs pool=default volName=benchmark-055
time="2024-12-13T10:36:15+01:00" level=debug msg="Removing device" device=root instance=benchmark-055 instanceType=container project=default type=disk
time="2024-12-13T10:36:15+01:00" level=debug msg="CreateInstanceFromImage finished" driver=zfs instance=benchmark-055 pool=default project=default
time="2024-12-13T10:36:15+01:00" level=debug msg="Instance operation lock finished" action=create err="<nil>" instance=benchmark-055 project=default reusable=false
time="2024-12-13T10:36:15+01:00" level=debug msg="Removing device" device=eth0 instance=benchmark-055 instanceType=container project=default type=nic
time="2024-12-13T10:36:15+01:00" level=info msg="ID: 7e31f71c-9194-40da-aaa8-eca2192195c2, Class: task, Description: Creating instance" CreatedAt="2024-12-13 10:35:54.267475495 +0100 CET" Err="Failed creating instance from image: Failed to mount \"/dev/zvol/zdata/incus/containers/benchmark-055\" on \"/var/lib/incus/storage-pools/default/containers/benchmark-055\" using \"ext4\": no such file or directory" Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/benchmark-055]]" Status=Failure StatusCode=Failure UpdatedAt="2024-12-13 10:35:54.267475495 +0100 CET"
time="2024-12-13T10:36:15+01:00" level=debug msg="Failure for operation" class=task description="Creating instance" err="Failed creating instance from image: Failed to mount \"/dev/zvol/zdata/incus/containers/benchmark-055\" on \"/var/lib/incus/storage-pools/default/containers/benchmark-055\" using \"ext4\": no such file or directory" operation=7e31f71c-9194-40da-aaa8-eca2192195c2 project=default
(sysadm_r)@pluto ~ #
benchmark-082 failed with context deadline exceeded
(sysadm_r)@pluto ~ # grep benchmark-082 incus-monitor .log
time="2024-12-13T10:38:30+01:00" level=info msg="ID: 69bfdd5e-d71c-4503-a18d-56fac3a0f59c, Class: task, Description: Creating instance" CreatedAt="2024-12-13 10:38:30.739073919 +0100 CET" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/benchmark-082]]" Status=Pending StatusCode=Pending UpdatedAt="2024-12-13 10:38:30.739073919 +0100 CET"
time="2024-12-13T10:38:30+01:00" level=info msg="ID: 69bfdd5e-d71c-4503-a18d-56fac3a0f59c, Class: task, Description: Creating instance" CreatedAt="2024-12-13 10:38:30.739073919 +0100 CET" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/benchmark-082]]" Status=Running StatusCode=Running UpdatedAt="2024-12-13 10:38:30.739073919 +0100 CET"
time="2024-12-13T10:38:30+01:00" level=debug msg="Instance operation lock created" action=create instance=benchmark-082 project=default reusable=false
time="2024-12-13T10:38:30+01:00" level=info msg="Creating instance" ephemeral=false instance=benchmark-082 instanceType=container project=default
time="2024-12-13T10:38:31+01:00" level=debug msg="Adding device" device=eth0 instance=benchmark-082 instanceType=container project=default type=nic
time="2024-12-13T10:38:31+01:00" level=debug msg="Adding device" device=root instance=benchmark-082 instanceType=container project=default type=disk
time="2024-12-13T10:38:31+01:00" level=info msg="Action: instance-created, Source: /1.0/instances/benchmark-082, Requestor: unix/root (@)" location=none storage-pool=default type=container
time="2024-12-13T10:38:31+01:00" level=info msg="Created instance" ephemeral=false instance=benchmark-082 instanceType=container project=default
time="2024-12-13T10:38:31+01:00" level=debug msg="CreateInstanceFromImage started" driver=zfs instance=benchmark-082 pool=default project=default
time="2024-12-13T10:39:11+01:00" level=debug msg="Removing device" device=eth0 instance=benchmark-082 instanceType=container project=default type=nic
time="2024-12-13T10:39:11+01:00" level=debug msg="CreateInstanceFromImage finished" driver=zfs instance=benchmark-082 pool=default project=default
time="2024-12-13T10:39:11+01:00" level=debug msg="Instance operation lock finished" action=create err="<nil>" instance=benchmark-082 project=default reusable=false
time="2024-12-13T10:39:11+01:00" level=debug msg="Removing device" device=root instance=benchmark-082 instanceType=container project=default type=disk
time="2024-12-13T10:39:11+01:00" level=info msg="ID: 69bfdd5e-d71c-4503-a18d-56fac3a0f59c, Class: task, Description: Creating instance" CreatedAt="2024-12-13 10:38:30.739073919 +0100 CET" Err="Failed creating instance from image: Failed to activate volume: Failed to locate zvol for \"zdata/incus/containers/benchmark-082\": context deadline exceeded" Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/benchmark-082]]" Status=Failure StatusCode=Failure UpdatedAt="2024-12-13 10:38:30.739073919 +0100 CET"
time="2024-12-13T10:39:11+01:00" level=debug msg="Failure for operation" class=task description="Creating instance" err="Failed creating instance from image: Failed to activate volume: Failed to locate zvol for \"zdata/incus/containers/benchmark-082\": context deadline exceeded" operation=69bfdd5e-d71c-4503-a18d-56fac3a0f59c project=default
(sysadm_r)@pluto ~ #