-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi!
I am sorry in advance, if this is not the correct repository to open an issue for my problem.
I am testing Elemental operator on VMs (they are not in any way controlled by Rancher, so they can be considered physical hosts, I suppose).
The initial provisioning works well.
After several (about 4 updates of the OS image), the update Pod starts failing with an error 'btrfs' command output: ERROR: cannot create subvolume: Read-only file system.
It is reproducible in a sense that as soon as it starts, it happens on all the machines in my test cluster (3 machines).
This is my UpdateGroup Elemental resource I used to update machines. Every time, I did a change in osImage field:
apiVersion: elemental.cattle.io/v1beta1
kind: ManagedOSImage
metadata:
name: gotoleap
namespace: fleet-default
spec:
clusterTargets:
- clusterName: elemental-test-2
drain:
deleteLocalData: true
force: true
ignoreDaemonSets: true
skipWaitForDeleteTimeout: 60
osImage: example.com/elemental/base-os/leap:master-f96725e1
status:
conditions:
- lastTransitionTime: '2025-07-18T11:52:27Z'
message: ''
reason: FleetBundleCreateSuccess
status: 'True'
type: FleetBundleCreation
- lastTransitionTime: '2025-08-18T15:16:44Z'
message: ''
reason: UnknownReason
status: 'True'
type: ReadyBefore the update, the machines are running Linux: Linux 6.4.0-150600.23.53-default #1 SMP PREEMPT_DYNAMIC Wed Jun 4 05:37:40 UTC 2025 (2d991ff) x86_64 x86_64 x86_64 GNU/Linux and OpenSUSE Leap 15.6.
Elemental version: 2.2.1
Elemental Operator version: 106.0.1+up1.6.9
Elemental Toolkit version: 2.2.5
Error message:
Defaulted container "upgrade" out of: upgrade, drain (init)
+ HOST_DIR=/host
+ RELEASE_FILE=/etc/os-release
+ CONF_FILE=/run/data/cloud-config
+ LOCK_TIMEOUT=600
+ LOCK_FILE=/host/run/elemental/upgrade.lock
+ mkdir -p /host/run/elemental
+ flock -w 600 200
++ nsenter -i -m -t 1 -- systemctl is-system-running
+ SYSSTATUS=running
+ isEqualVersion
+ diff /etc/os-release /host/etc/os-release
+ return 1
+ '[' '' '!=' true ']'
+ '[' '' '!=' true ']'
+ isHigherVersion
++ . /etc/os-release
+++ NAME='openSUSE Leap'
+++ VERSION=15.6
+++ ID=opensuse-leap
+++ ID_LIKE='suse opensuse'
+++ VERSION_ID=15.6
+++ PRETTY_NAME='openSUSE Leap 15.6'
+++ ANSI_COLOR='0;32'
+++ CPE_NAME=cpe:/o:opensuse:leap:15.6
+++ BUG_REPORT_URL=https://bugs.opensuse.org
+++ HOME_URL=https://www.opensuse.org/
+++ DOCUMENTATION_URL=https://en.opensuse.org/Portal:Leap
+++ LOGO=distributor-logo-Leap
+++ TIMESTAMP=20250818145852
+++ GRUB_ENTRY_NAME='Elemental Dev'
++ printf '%s\n' ''
+ local img_ver=
++ . /etc/os-release
+++ NAME='openSUSE Leap'
+++ VERSION=15.6
+++ ID=opensuse-leap
+++ ID_LIKE='suse opensuse'
+++ VERSION_ID=15.6
+++ PRETTY_NAME='openSUSE Leap 15.6'
+++ ANSI_COLOR='0;32'
+++ CPE_NAME=cpe:/o:opensuse:leap:15.6
+++ BUG_REPORT_URL=https://bugs.opensuse.org
+++ HOME_URL=https://www.opensuse.org/
+++ DOCUMENTATION_URL=https://en.opensuse.org/Portal:Leap
+++ LOGO=distributor-logo-Leap
+++ TIMESTAMP=20250818145852
+++ GRUB_ENTRY_NAME='Elemental Dev'
++ printf '%s\n' ''
+ local img_repo=
++ . /host/etc/os-release
+++ NAME='openSUSE Leap'
+++ VERSION=15.6
+++ ID=opensuse-leap
+++ ID_LIKE='suse opensuse'
+++ VERSION_ID=15.6
+++ PRETTY_NAME='openSUSE Leap 15.6'
+++ ANSI_COLOR='0;32'
+++ CPE_NAME=cpe:/o:opensuse:leap:15.6
+++ BUG_REPORT_URL=https://bugs.opensuse.org
+++ HOME_URL=https://www.opensuse.org/
+++ DOCUMENTATION_URL=https://en.opensuse.org/Portal:Leap
+++ LOGO=distributor-logo-Leap
+++ TIMESTAMP=20250819144550
+++ GRUB_ENTRY_NAME='Elemental OpenSUSE'
++ printf '%s\n' ''
+ local host_ver=
++ . /host/etc/os-release
+++ NAME='openSUSE Leap'
+++ VERSION=15.6
+++ ID=opensuse-leap
+++ ID_LIKE='suse opensuse'
+++ VERSION_ID=15.6
+++ PRETTY_NAME='openSUSE Leap 15.6'
+++ ANSI_COLOR='0;32'
+++ CPE_NAME=cpe:/o:opensuse:leap:15.6
+++ BUG_REPORT_URL=https://bugs.opensuse.org
+++ HOME_URL=https://www.opensuse.org/
+++ DOCUMENTATION_URL=https://en.opensuse.org/Portal:Leap
+++ LOGO=distributor-logo-Leap
+++ TIMESTAMP=20250819144550
+++ GRUB_ENTRY_NAME='Elemental OpenSUSE'
++ printf '%s\n' ''
+ local host_repo=
+ local higher_ver
+ '[' '' '!=' '' ']'
+ '[' -z '' ']'
+ return 0
+ config
+ '[' '!' -s /run/data/cloud-config ']'
+ '[' -e /host/oem/90_operator.yaml ']'
+ return 0
+ mount --rbind /host/dev /dev
+ mount --rbind /host/run /run
+ '[' '' = true ']'
+ '[' '' = true ']'
+ elemental --debug upgrade --bootloader --system dir:/
DEBU[2025-08-20T06:13:03Z] Starting elemental version on commit
INFO[2025-08-20T06:13:03Z] Reading configuration from '/etc/elemental'
DEBU[2025-08-20T06:13:03Z] Full config loaded: &types.RunConfig{
Reboot: false,
PowerOff: false,
EjectCD: false,
Snapshotter: types.SnapshotterConfig{
Type: "btrfs",
MaxSnaps: 4,
Config: &types.BtrfsConfig{},
},
Config: types.Config{
Logger: &types.logrusWrapper{ // p0
Logger: &logrus.Logger{
Out: &os.File{},
Hooks: logrus.LevelHooks{},
Formatter: &logrus.TextFormatter{
ForceColors: true,
DisableColors: false,
ForceQuote: false,
DisableQuote: false,
EnvironmentOverrideColors: false,
DisableTimestamp: false,
FullTimestamp: true,
TimestampFormat: "",
DisableSorting: false,
SortingFunc: ,
DisableLevelTruncation: false,
PadLevelText: false,
QuoteEmptyFields: false,
FieldMap: logrus.FieldMap(nil),
CallerPrettyfier: ,
},
ReportCaller: false,
Level: 5,
ExitFunc: os.Exit,
BufferPool: nil,
},
},
Fs: &vfs.osfs{}, // p1
Mounter: &mount.Mounter{},
Runner: &types.RealRunner{ // p2
Logger: p0,
},
Syscall: &types.RealSyscall{},
CloudInitRunner: &cloudinit.YipCloudInitRunner{},
ImageExtractor: types.OCIImageExtractor{},
Client: &http.Client{},
Platform: &types.Platform{
OS: "linux",
Arch: "x86_64",
GolangArch: "amd64",
},
Cosign: false,
Verify: false,
TLSVerify: true,
CosignPubKey: "",
LocalImage: false,
Arch: "",
SquashFsCompressionConfig: []string{
"-no-compression",
},
SquashFsNoCompression: true,
CloudInitPaths: []string{
"/system/oem",
"/oem/",
"/usr/local/cloud-config/",
},
Strict: false,
},
}
DEBU[2025-08-20T06:13:03Z] Loaded upgrade UpgradeSpec: &types.UpgradeSpec{
RecoveryUpgrade: false,
System: &types.ImageSource{},
RecoverySystem: types.Image{
File: "/run/elemental/recovery/boot-transition/recovery.img",
Label: "",
Size: 0,
FS: "squashfs",
Source: &types.ImageSource{},
MountPoint: "/run/elemental/transition",
LoopDevice: "",
},
GrubDefEntry: "",
BootloaderUpgrade: true,
SnapshotLabels: types.KeyValuePair(nil),
Partitions: types.ElementalPartitions{
BIOS: nil,
Boot: &types.Partition{
Name: "efi",
FilesystemLabel: "COS_GRUB",
Size: 64,
FS: "vfat",
Flags: nil,
MountPoint: "/host/run/elemental/efi",
Path: "/dev/sda1",
Disk: "/dev/sda",
},
OEM: &types.Partition{
Name: "oem",
FilesystemLabel: "COS_OEM",
Size: 64,
FS: "ext4",
Flags: nil,
MountPoint: "/host/oem",
Path: "/dev/sda2",
Disk: "/dev/sda",
},
Recovery: &types.Partition{
Name: "recovery",
FilesystemLabel: "COS_RECOVERY",
Size: 4096,
FS: "ext4",
Flags: nil,
MountPoint: "/run/elemental/recovery",
Path: "/dev/sda3",
Disk: "/dev/sda",
},
State: &types.Partition{
Name: "state",
FilesystemLabel: "COS_STATE",
Size: 8192,
FS: "btrfs",
Flags: nil,
MountPoint: "/host",
Path: "/dev/sda4",
Disk: "/dev/sda",
},
Persistent: &types.Partition{
Name: "persistent",
FilesystemLabel: "COS_PERSISTENT",
Size: 53118,
FS: "ext4",
Flags: nil,
MountPoint: "/host/etc/iscsi",
Path: "/dev/sda5",
Disk: "/dev/sda",
},
},
State: &types.InstallState{
Date: "2025-08-19T14:54:16Z",
Partitions: map[string]*types.PartitionState{
"efi": &types.PartitionState{
FSLabel: "COS_GRUB",
RecoveryImage: nil,
Snapshots: map[int]*types.SystemState(nil), // p0
},
"oem": &types.PartitionState{
FSLabel: "COS_OEM",
RecoveryImage: nil,
Snapshots: p0,
},
"persistent": &types.PartitionState{
FSLabel: "COS_PERSISTENT",
RecoveryImage: nil,
Snapshots: p0,
},
"recovery": &types.PartitionState{
FSLabel: "COS_RECOVERY",
RecoveryImage: &types.SystemState{
Source: &types.ImageSource{},
Digest: "",
Active: false,
Label: "",
FS: "squashfs",
Labels: map[string]string(nil), // p1
Date: "",
FromAction: "",
},
Snapshots: p0,
},
"state": &types.PartitionState{
FSLabel: "COS_STATE",
RecoveryImage: nil,
Snapshots: map[int]*types.SystemState{
1: &types.SystemState{
Source: &types.ImageSource{},
Digest: "",
Active: false,
Label: "",
FS: "",
Labels: p1,
Date: "",
FromAction: "",
},
2: &types.SystemState{
Source: &types.ImageSource{},
Digest: "",
Active: false,
Label: "",
FS: "",
Labels: p1,
Date: "2025-08-18T18:13:29Z",
FromAction: "upgrade",
},
3: &types.SystemState{
Source: &types.ImageSource{},
Digest: "",
Active: false,
Label: "",
FS: "",
Labels: p1,
Date: "2025-08-19T13:29:35Z",
FromAction: "upgrade",
},
4: &types.SystemState{
Source: &types.ImageSource{},
Digest: "",
Active: false,
Label: "",
FS: "",
Labels: p1,
Date: "2025-08-19T14:21:04Z",
FromAction: "upgrade",
},
5: &types.SystemState{
Source: &types.ImageSource{},
Digest: "",
Active: true,
Label: "",
FS: "",
Labels: p1,
Date: "2025-08-19T14:54:16Z",
FromAction: "upgrade",
},
},
},
},
Snapshotter: types.SnapshotterConfig{
Type: "btrfs",
MaxSnaps: 4,
Config: &types.BtrfsConfig{},
},
},
}
INFO[2025-08-20T06:13:03Z] Upgrade called
DEBU[2025-08-20T06:13:03Z] Running cmd: 'findmnt -fno OPTIONS /host/run/elemental/efi'
DEBU[2025-08-20T06:13:03Z] Mounting partition COS_GRUB
DEBU[2025-08-20T06:13:03Z] Mounting partition COS_RECOVERY
DEBU[2025-08-20T06:13:03Z] Running cmd: 'findmnt -fno OPTIONS /host/etc/iscsi'
DEBU[2025-08-20T06:13:03Z] Already RW mounted: persistent at /host/etc/iscsi
INFO[2025-08-20T06:13:03Z] Initiate btrfs snapshotter at /host
DEBU[2025-08-20T06:13:03Z] Checking if essential subvolumes are already created
DEBU[2025-08-20T06:13:03Z] Running cmd: 'btrfs subvolume list --sort=path /host'
DEBU[2025-08-20T06:13:03Z] Looking for subvolume ids 257 and 258 in subvolume list: [{@ 256} {@/.snapshots 257} {@/.snapshots/2/snapshot 259} {@/.snapshots/3/snapshot 260} {@/.snapshots/4/snapshot 261} {@/.snapshots/5/snapshot 262}]
DEBU[2025-08-20T06:13:03Z] Running initial btrfs configuration
DEBU[2025-08-20T06:13:03Z] Enabling btrfs quota
DEBU[2025-08-20T06:13:03Z] Running cmd: 'btrfs quota enable /host'
DEBU[2025-08-20T06:13:03Z] Creating essential subvolumes
DEBU[2025-08-20T06:13:03Z] Creating subvolume: /host/@
DEBU[2025-08-20T06:13:03Z] Running cmd: 'btrfs subvolume create /host/@'
DEBU[2025-08-20T06:13:03Z] 'btrfs' command reported an error: exit status 1
DEBU[2025-08-20T06:13:03Z] 'btrfs' command output: ERROR: cannot create subvolume: Read-only file system
Create subvolume '/host/@'
ERRO[2025-08-20T06:13:03Z] failed creating subvolume /host/@: ERROR: cannot create subvolume: Read-only file system
Create subvolume '/host/@'
ERRO[2025-08-20T06:13:03Z] failed initializing snapshotter
DEBU[2025-08-20T06:13:03Z] Unmounting partition COS_RECOVERY
DEBU[2025-08-20T06:13:03Z] Mounting partition COS_GRUB
ERRO[2025-08-20T06:13:03Z] upgrade command failed: 1 error occurred:
* exit status 1
mounts on the host (I removed all the lines with tmpfs' and running container mounts):
w-v-suse-micro-0:~ # mount | grep -v /run/k3s/containerd/io.containerd.runtime.v2.task/
/dev/sda4 on / type btrfs (ro,relatime,space_cache=v2,subvolid=262,subvol=/@/.snapshots/5/snapshot)
/dev/sda2 on /oem type ext4 (rw,relatime)
overlay on /etc type overlay (rw,relatime,lowerdir=/sysroot/etc,upperdir=/run/elemental/overlay/etc.overlay/upper,workdir=/run/elemental/overlay/etc.overlay/work)
overlay on /var type overlay (rw,relatime,lowerdir=/sysroot/var,upperdir=/run/elemental/overlay/var.overlay/upper,workdir=/run/elemental/overlay/var.overlay/work)
overlay on /srv type overlay (rw,relatime,lowerdir=/sysroot/srv,upperdir=/run/elemental/overlay/srv.overlay/upper,workdir=/run/elemental/overlay/srv.overlay/work)
/dev/sda5 on /home type ext4 (rw,relatime)
/dev/sda5 on /root type ext4 (rw,relatime)
/dev/sda5 on /opt type ext4 (rw,relatime)
/dev/sda5 on /usr/libexec type ext4 (rw,relatime)
/dev/sda5 on /var/log type ext4 (rw,relatime)
/dev/sda5 on /etc/iscsi type ext4 (rw,relatime)
/dev/sda5 on /etc/ssh type ext4 (rw,relatime)
/dev/sda5 on /etc/rancher type ext4 (rw,relatime)
/dev/sda5 on /etc/systemd type ext4 (rw,relatime)
/dev/sda5 on /usr/local type ext4 (rw,relatime)
/dev/sda5 on /etc/cni type ext4 (rw,relatime)
/dev/sda5 on /var/lib/elemental type ext4 (rw,relatime)
/dev/sda5 on /var/lib/rancher type ext4 (rw,relatime)
/dev/sda5 on /var/lib/kubelet type ext4 (rw,relatime)
/dev/sda5 on /var/lib/NetworkManager type ext4 (rw,relatime)
/dev/sda5 on /var/lib/cni type ext4 (rw,relatime)
/dev/sda5 on /var/lib/calico type ext4 (rw,relatime)
Also, I noticed that a newly deployed system has a snapshot with ID 258 while after 4 upgrades it disappears:
After upgrades:
btrfs subvolume list /
ID 256 gen 86 top level 5 path @
ID 257 gen 91 top level 256 path @/.snapshots
ID 259 gen 34 top level 257 path @/.snapshots/2/snapshot
ID 260 gen 52 top level 257 path @/.snapshots/3/snapshot
ID 261 gen 73 top level 257 path @/.snapshots/4/snapshot
ID 262 gen 82 top level 257 path @/.snapshots/5/snapshot
New system:
ID 256 gen 99 top level 5 path @
ID 257 gen 101 top level 256 path @/.snapshots
ID 258 gen 17 top level 257 path @/.snapshots/1/snapshot
I see that there is a function which checks the snapshot ID in version 2.2.5: https://github.com/rancher/elemental-toolkit/blob/v2.2.5/pkg/snapshotter/btrfs.go#L38 and https://github.com/rancher/elemental-toolkit/blob/v2.2.5/pkg/snapshotter/btrfs.go#L526
This function is called in this context during upgrade: https://github.com/rancher/elemental-toolkit/blob/v2.2.5/pkg/snapshotter/btrfs.go#L147
and from the logs, I suspect that the absence of the snapshot 258 causes the isInitiated to return false => setBtrfsForFirstTime (https://github.com/rancher/elemental-toolkit/blob/v2.2.5/pkg/snapshotter/btrfs.go#L160) is called and fails.