Skip to content

Flake - Node Setup should successfully start a failed mount unit with a corrupted filesystem when it's overwritten with a clean one [Serial] #2550

Open
@zimnx

Description

@zimnx

Link to the job that flaked.

https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2546/pull-scylla-operator-master-e2e-gke-serial/1899784532085706752

Snippet of what failed.

 Node Setup should successfully start a failed mount unit with a corrupted filesystem when it's overwritten with a clean one [Serial]
github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_disksetup.go:527
  STEP: Creating a new namespace @ 03/12/25 12:05:14.236
  Mar 12 12:05:14.254: INFO: Created namespace "e2e-test-nodesetup-gflq6-0-f9gm5".
  STEP: Waiting for service account token Secret "e2e-user-token" in namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:14.561
I0312 12:05:14.571804       1 cache/reflector.go:376] Caches populated for *v1.Secret from k8s.io/[email protected]/tools/cache/reflector.go:251
  STEP: Waiting for default ServiceAccount in namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:14.594
I0312 12:05:14.621606       1 cache/reflector.go:376] Caches populated for *v1.ServiceAccount from k8s.io/[email protected]/tools/cache/reflector.go:251
  STEP: Waiting for kube-root-ca.crt in namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:14.628
I0312 12:05:14.647839       1 cache/reflector.go:376] Caches populated for *v1.ConfigMap from k8s.io/[email protected]/tools/cache/reflector.go:251
  STEP: Verifying there is at least one scylla node @ 03/12/25 12:05:14.653
  Mar 12 12:05:14.664: INFO: There are 1 scylla nodes
  STEP: Snapshotting object scylla.scylladb.com/v1alpha1, Resource=nodeconfigs "cluster" @ 03/12/25 12:05:14.665
  STEP: Deleting object scylla.scylladb.com/v1alpha1, Resource=nodeconfigs "cluster". @ 03/12/25 12:05:14.671
I0312 12:05:14.671605       1 framework/cleanup.go:102] "No existing object found" GVR="scylla.scylladb.com/v1alpha1, Resource=nodeconfigs" Instance="cluster"
  STEP: Creating NodeConfig @ 03/12/25 12:05:14.677
  STEP: Creating a client Pod @ 03/12/25 12:05:14.694
  STEP: Waiting for client Pod to be in a running state @ 03/12/25 12:05:14.773
I0312 12:05:14.788969       1 cache/reflector.go:376] Caches populated for *v1.Pod from k8s.io/[email protected]/tools/cache/reflector.go:251
  STEP: Waiting for NodeConfig to roll out @ 03/12/25 12:05:16.319
I0312 12:05:16.336977       1 cache/reflector.go:376] Caches populated for *v1alpha1.NodeConfig from k8s.io/[email protected]/tools/cache/reflector.go:251
  Mar 12 12:05:23.719: INFO: NodeConfig "cluster" (RV=16273) is rolled out
  STEP: Verifying the filesystem's integrity @ 03/12/25 12:05:23.719
  STEP: Getting the filesystem's block size @ 03/12/25 12:05:23.822
  STEP: Corrupting the filesystem @ 03/12/25 12:05:23.915
  STEP: Verifying that the filesystem is corrupted @ 03/12/25 12:05:24.054
  STEP: Patching NodeConfig's mount configuration with a mount over a corrupted filesystem @ 03/12/25 12:05:24.141
  STEP: Waiting for NodeConfig to be in a degraded state @ 03/12/25 12:05:24.172
I0312 12:05:24.183545       1 cache/reflector.go:376] Caches populated for *v1alpha1.NodeConfig from k8s.io/[email protected]/tools/cache/reflector.go:251
  STEP: Overwriting the corrupted filesystem @ 03/12/25 12:05:26.191
  STEP: Zeroing XFS log @ 03/12/25 12:05:26.465
  [FAILED] in [It] - github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_disksetup.go:644 @ 03/12/25 12:05:26.973
  STEP: Collecting events from namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:26.973
  STEP: Found 4 events. @ 03/12/25 12:05:26.982
  Mar 12 12:05:26.982: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for client: { } Scheduled: Successfully assigned e2e-test-nodesetup-gflq6-0-f9gm5/client to gke-so-98f3a2e2-e8c2-470e-a8-scylladb-09b46dbf-7wn9
  Mar 12 12:05:26.982: INFO: At 2025-03-12 12:05:15 +0000 UTC - event for client: {kubelet gke-so-98f3a2e2-e8c2-470e-a8-scylladb-09b46dbf-7wn9} Pulled: Container image "quay.io/scylladb/scylla-operator-images:node-setup-v0.0.3@sha256:c6b3de240cc5c884d5c617485bae35c51572cdfd39b6431d2e1f759c7d7feea1" already present on machine
  Mar 12 12:05:26.982: INFO: At 2025-03-12 12:05:15 +0000 UTC - event for client: {kubelet gke-so-98f3a2e2-e8c2-470e-a8-scylladb-09b46dbf-7wn9} Created: Created container: client
  Mar 12 12:05:26.982: INFO: At 2025-03-12 12:05:15 +0000 UTC - event for client: {kubelet gke-so-98f3a2e2-e8c2-470e-a8-scylladb-09b46dbf-7wn9} Started: Started container client
  STEP: Collecting dumps from namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:26.982
  STEP: Collecting global scylla.scylladb.com/v1alpha1, Resource=nodeconfigs "cluster" for namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:29.954
  STEP: Destroying namespace "e2e-test-nodesetup-gflq6-0-f9gm5". @ 03/12/25 12:05:29.968
  STEP: Waiting for namespace "e2e-test-nodesetup-gflq6-0-f9gm5" to be removed. @ 03/12/25 12:05:29.983
I0312 12:05:29.993359       1 cache/reflector.go:376] Caches populated for *unstructured.Unstructured from k8s.io/[email protected]/tools/cache/reflector.go:251
I0312 12:05:40.222858       1 framework/cleanup.go:73] "Namespace removed." Namespace="e2e-test-nodesetup-gflq6-0-f9gm5"
  STEP: Deleting object scylla.scylladb.com/v1alpha1, Resource=nodeconfigs "cluster". @ 03/12/25 12:05:40.222
  STEP: Waiting for object scylla.scylladb.com/v1alpha1, Resource=nodeconfigs "cluster" to be removed. @ 03/12/25 12:05:40.245
I0312 12:05:40.258189       1 cache/reflector.go:376] Caches populated for *unstructured.Unstructured from k8s.io/[email protected]/tools/cache/reflector.go:251
  STEP: Object scylla.scylladb.com/v1alpha1, Resource=nodeconfigs "cluster" has been removed. @ 03/12/25 12:05:41.494
• [FAILED] [27.258 seconds]
Node Setup [It] should successfully start a failed mount unit with a corrupted filesystem when it's overwritten with a clean one [Serial]
github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_disksetup.go:527
  [FAILED] %!(EXTRA string=xfs_repair: cannot open /host/dev/loops/e2e-test-nodesetup-gflq6-0-f9gm5: Device or resource busy
  )
  Unexpected error:
      <exec.CodeExitError>: 
      command terminated with exit code 1
      {
          Err: <*errors.errorString | 0xc000f9eab0>{
              s: "command terminated with exit code 1",
          },
          Code: 1,
      }
  occurred
  In [It] at: github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_disksetup.go:644 @ 03/12/25 12:05:26.973
  Full Stack Trace
    github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig.init.func1.10({0x7f93029be058, 0xc000e9b5f0})
    	github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_disksetup.go:644 +0x1e29 

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/flakeCategorizes issue or PR as related to a flaky test.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.needs-priorityIndicates a PR lacks a `priority/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions