Skip to content

Possibly spurious CKSUM errors in 2.4 #18266

@stuartthebruce

Description

@stuartthebruce

System information

Type Version/Name
Distribution Name Rocky Linux
Distribution Version 8.10
Kernel Version 4.18.0-553.105.1.el8_10
Architecture x86_64
OpenZFS Version 2.4.1-1

Describe the problem you're observing

Possibly spurious CKSUM errors in zpool status under heavy write load while running a scrub

Describe how to reproduce the problem

The following CKSUM errors have been slowly increasing over the last day while running scrub after upgrading from 2.4.0.

[root@zfs9 ~]# zpool status
  pool: scratch
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub in progress since Thu Feb 26 11:01:09 2026
	110T / 832T scanned at 1.10G/s, 110T / 832T issued at 1.10G/s
	1.55M repaired, 13.28% done, 7 days 18:12:46 to go
config:

	NAME                                                                                                           STATE     READ WRITE CKSUM
	scratch                                                                                                        ONLINE       0     0     0
	  raidz2-0                                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb0e4ee0                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab4c0224                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab4adc68                                                                                     ONLINE       0     0     5  (repairing)
	    wwn-0x5000cca2bb0ec380                                                                                     ONLINE       0     0     5  (repairing)
	    wwn-0x5000cca2b6673968                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b6696458                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2bb1da3e0                                                                                     ONLINE       0     0     4  (repairing)
	    wwn-0x5000cca2bb1d5a44                                                                                     ONLINE       0     0     4  (repairing)
	    wwn-0x5000cca2bb0e978c                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2ab4698ac                                                                                     ONLINE       0     0     2  (repairing)
	  raidz2-1                                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2b64fb26c                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2b4bebddc                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb1c5b24                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab463550                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb100b8c                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb10d340                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb1dcdf0                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2ab4a0278                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b65fc49c                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2ab478664                                                                                     ONLINE       0     0     1  (repairing)
	  raidz2-2                                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb1d2e54                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb100e80                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb1d5a40                                                                                     ONLINE       0     0     4  (repairing)
	    wwn-0x5000cca2b66a586c                                                                                     ONLINE       0     0     4  (repairing)
	    wwn-0x5000cca2bb100c44                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb1d2364                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2aa0cac48                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2b6686f9c                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb1d9fd0                                                                                     ONLINE       0     0     3  (repairing)
	    wwn-0x5000cca2ab4669ec                                                                                     ONLINE       0     0     3  (repairing)
	  raidz2-3                                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb1bdf7c                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb1c59d8                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb10ed70                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b6696420                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b65fd028                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab4bf990                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb1d3224                                                                                     ONLINE       0     0     4  (repairing)
	    wwn-0x5000cca2b663287c                                                                                     ONLINE       0     0     4  (repairing)
	    wwn-0x5000cca2bb1d4b24                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2b6603740                                                                                     ONLINE       0     0     0
	  raidz2-4                                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab4bf520                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b9b97fe0                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b668b1b0                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2bb1bdfcc                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b6524d38                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb0ff814                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2bb1cfa38                                                                                     ONLINE       0     0     5  (repairing)
	    wwn-0x5000cca2ab4bbd70                                                                                     ONLINE       0     0     5  (repairing)
	    wwn-0x5000cca2bb0f0120                                                                                     ONLINE       0     0     3  (repairing)
	    wwn-0x5000cca2b66a0468                                                                                     ONLINE       0     0     3  (repairing)
	  raidz2-5                                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab4b3548                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2ab4667a0                                                                                     ONLINE       0     0     1  (repairing)
	    wwn-0x5000cca2ab498628                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2ab4b54bc                                                                                     ONLINE       0     0     0
	    wwn-0x5000cca2bb10ec78                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2b91f5308                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2bb102c94                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2bb0ebac4                                                                                     ONLINE       0     0     2  (repairing)
	    wwn-0x5000cca2bb1d1d0c                                                                                     ONLINE       0     0     3  (repairing)
	    wwn-0x5000cca2b6618c34                                                                                     ONLINE       0     0     3  (repairing)
	special	
	  mirror-6                                                                                                     ONLINE       0     0     0
	    nvme-nvme.8086-50484b45393133323030424833373541474e-494e54454c20535344504532314b3337354741-00000001-part2  ONLINE       0     0     0
	    nvme-nvme.8086-50484b45393133323031314633373541474e-494e54454c20535344504532314b3337354741-00000001-part2  ONLINE       0     0     0
	  mirror-7                                                                                                     ONLINE       0     0     0
	    nvme-nvme.8086-50484d32383133333030325734383042474e-494e54454c2053534450453231443438304741-00000001        ONLINE       0     0     0
	    nvme-nvme.8086-50484d32383133333030373934383042474e-494e54454c2053534450453231443438304741-00000001        ONLINE       0     0     0
	  mirror-8                                                                                                     ONLINE       0     0     0
	    nvme-nvme.8086-50484b45393133323032385833373541474e-494e54454c20535344504532314b3337354741-00000001-part2  ONLINE       0     0     0
	    nvme-nvme.8086-50484b45393133323032325433373541474e-494e54454c20535344504532314b3337354741-00000001-part2  ONLINE       0     0     0
	  mirror-9                                                                                                     ONLINE       0     0     0
	    nvme-nvme.8086-50484d32383133333030345434383042474e-494e54454c2053534450453231443438304741-00000001        ONLINE       0     0     0
	    nvme-nvme.8086-50484d32383133343030304234383042474e-494e54454c2053534450453231443438304741-00000001        ONLINE       0     0     0
	logs	
	  mirror-12                                                                                                    ONLINE       0     0     0
	    nvme-INTEL_SSDPE21K375GA_PHKE913200BH375AGN-part1                                                          ONLINE       0     0     0
	    nvme-INTEL_SSDPE21K375GA_PHKE9132011F375AGN-part1                                                          ONLINE       0     0     0
	  mirror-13                                                                                                    ONLINE       0     0     0
	    nvme-INTEL_SSDPE21K375GA_PHKE9132028X375AGN-part1                                                          ONLINE       0     0     0
	    nvme-INTEL_SSDPE21K375GA_PHKE9132022T375AGN-part1                                                          ONLINE       0     0     0
	cache
	  nvme-INTEL_SSDPE2KE076T8_BTLN8440010F7P6DGN                                                                  ONLINE       0     0     0
	  nvme-INTEL_SSDPE2KE076T8_BTLN836208DF7P6DGN                                                                  ONLINE       0     0     0

errors: No known data errors

This is while the pool is under fairly heavy write pressure from a large number of knfsd threads,

[root@zfs9 ~]# zpool iostat 5
              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
scratch      832T   151T  11.7K  22.6K  1.23G  2.14G
scratch      832T   151T  5.93K  10.6K   180M  1.27G
scratch      832T   151T  7.53K  28.5K   265M  3.51G
scratch      832T   151T  5.22K  20.8K   244M  2.29G
scratch      832T   151T  6.97K  23.1K   241M  2.60G
[root@zfs9 ~]# top

top - 15:33:18 up 1 day,  4:35,  3 users,  load average: 143.59, 188.68, 298.46
Tasks: 3247 total,   1 running, 3244 sleeping,   0 stopped,   2 zombie
%Cpu(s):  0.0 us,  3.8 sy,  0.0 ni, 90.8 id,  4.7 wa,  0.2 hi,  0.6 si,  0.0 st
MiB Mem : 385085.6 total, 119232.3 free, 212570.1 used,  53283.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 169439.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND           
  23961 root      20   0       0      0      0 S   9.5   0.0 112:02.80 nfsd              
  23965 root      20   0       0      0      0 D   9.2   0.0 307:30.55 nfsd              
  23958 root      20   0       0      0      0 D   8.2   0.0  58:19.69 nfsd              
  23959 root      20   0       0      0      0 D   7.6   0.0  71:07.57 nfsd              
  23957 root      20   0       0      0      0 D   6.6   0.0  48:47.01 nfsd              
  23960 root      20   0       0      0      0 D   6.6   0.0  88:22.35 nfsd              
  23956 root      20   0       0      0      0 D   6.2   0.0  41:23.96 nfsd              
  23964 root      20   0       0      0      0 D   5.6   0.0 238:27.75 nfsd              
  23962 root      20   0       0      0      0 D   4.9   0.0 143:44.90 nfsd              
  23955 root      20   0       0      0      0 D   4.6   0.0  35:45.72 nfsd              
  23951 root      20   0       0      0      0 D   4.3   0.0  23:11.53 nfsd              
  23952 root      20   0       0      0      0 D   4.3   0.0  25:28.01 nfsd              
  23953 root      20   0       0      0      0 D   4.3   0.0  28:04.53 nfsd              
  23963 root      20   0       0      0      0 D   3.9   0.0 186:48.25 nfsd              
  23950 root      20   0       0      0      0 D   3.6   0.0  21:25.47 nfsd              
  23949 root      20   0       0      0      0 D   3.0   0.0  20:00.93 nfsd              
  23954 root      20   0       0      0      0 D   3.0   0.0  31:35.67 nfsd              
   6761 root       0 -20       0      0      0 S   2.6   0.0  41:47.83 z_null_int        
  23945 root      20   0       0      0      0 D   2.0   0.0  15:53.21 nfsd              
  23946 root      20   0       0      0      0 D   2.0   0.0  16:54.09 nfsd              
  23948 root      20   0       0      0      0 D   2.0   0.0  18:53.44 nfsd              
  23938 root      20   0       0      0      0 D   1.6   0.0  12:09.68 nfsd              
  23941 root      20   0       0      0      0 D   1.6   0.0  13:31.43 nfsd              
  23942 root      20   0       0      0      0 D   1.6   0.0  14:02.40 nfsd              
  23943 root      20   0       0      0      0 D   1.6   0.0  14:37.25 nfsd              
  23944 root      20   0       0      0      0 D   1.6   0.0  15:15.52 nfsd              
  23947 root      20   0       0      0      0 D   1.6   0.0  17:44.36 nfsd              
 960368 root      20   0   57956   8036   3672 R   1.6   0.0   0:00.15 top               
   3376 root      39  19       0      0      0 S   1.3   0.0  55:36.67 dbuf_evict        
   3509 root      20   0       0      0      0 S   1.3   0.0  22:00.74 l2arc_feed        
  23907 root      20   0       0      0      0 D   1.3   0.0   6:18.30 nfsd              
  23921 root      20   0       0      0      0 D   1.3   0.0   7:56.17 nfsd           
...

The raidz2 vdev are comprised of HDD in an external JBOD with multipath SAS connections and there are no low-level hardware issues being reported during this scrub. Furthermore, this pool has passed several scrubs under version 2.4.0 (and early versions) without exhibiting CKSUM errors that were not correlated with READ and/or WRITE errors from a failing HDD (or dodgy SAS connection).

I will let the scrub finish, however, if the write traffic drops significantly I will run a zpool clear to start over and see if CKSUM errors continue to show up when scrub reading is the predominant load. The hypothesis being that 2.4.1 might have introduced a race condition when verifying raidz2 HDD data under heave write load.

I thought I would record this now before I learn more and/or downgrade to 2.4.0 to see if this hypothesis makes any sense or if anyone else is seeing anything similar in 2.4.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions