Skip to content

bcachefs device remove of empty device leads to multiple thousands of percent complete in metadata #1023

@f0ff886f

Description

@f0ff886f

My setup:

bcachefs fs usage /mnt/big_raid/ -h
Filesystem: 1d9598e2-e6e8-4b93-9042-6be52775de5f
Size:                          47.7T
Used:                          25.9T
Online reserved:                   0

Data by durability desired and amount degraded:
          undegraded
2x:            25.9T

Device label                   Device      State          Size      Used  Use%
hdd.hdd1 (device 0):           sda         rw            25.4T     12.9T   50%
hdd.hdd2 (device 1):           sdb         rw            25.4T     12.9T   50%
nvme.nvme0 (device 2):         nvme1n1     rw             953G     7.45G   00%

I added the nvme0 device with:

bcachefs device add --label=nvme.nvme0 /mnt/big_raid /dev/nvme1n1

I then tried to remove the nvme device right away(I wanted to experiment with writethrough vs writeback performance, but didn't write any data yet):

bcachefs device remove /dev/nvme1n1 /mnt/big_raid

And that command is stuck (won't respond to sigint), and checking the kernel logs I see that metadata evacuation is at a very high percentage:

dmesg | rg bcachefs
[    4.480319] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): Using encoding defined by superblock: utf8-12.1.0
[    4.481972] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): recovering from clean shutdown, journal seq 1781811
[    4.487363] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): Using encoding defined by superblock: utf8-12.1.0
[    4.488530] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): recovering from clean shutdown, journal seq 274935
[    4.662278] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): accounting_read...
[    4.700701] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): accounting_read...
[    4.715699] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): alloc_read... done (0 seconds)
[    4.733125] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): snapshots_read... done (0 seconds)
[    4.737934] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): alloc_read... done (0 seconds)
[    4.752952] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): snapshots_read... done (0 seconds)
[    4.864204] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): going read-write
[    4.865697] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): journal_replay... done (0 seconds)
[    4.868179] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): check_snapshots... done (0 seconds)
[    4.868936] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): resume_logged_ops... done (0 seconds)
[    4.869706] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): delete_dead_inodes... done (0 seconds)
[    4.870466] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): btree_bitmap_gc...
[    4.985624] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): going read-write
[    4.986345] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): sda has 120G btree buckets and 1.25T marked in bitmap
[    4.986844] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): journal_replay... done (0 seconds)
[    4.987798] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): check_snapshots... done (0 seconds)
[    4.988289] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): resume_logged_ops... done (0 seconds)
[    4.988765] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): delete_dead_inodes... done (0 seconds)
[    5.030215] bcachefs (714a7cc1-7fe7-4a72-b834-9f24ac26686d): mi_btree_bitmap sectors 272G -> 272G
[  354.261719] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): initializing freespace
[  638.729017] bcachefs (nvme1n1): evacuating
[  648.731233] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 2%, done 1627/73831 nodes, at extents:2147484487:6230272:U32_MAX
[  658.730741] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 4%, done 3429/73831 nodes, at extents:2147488910:375808:U32_MAX
[  668.731618] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 7%, done 5235/73831 nodes, at extents:2147489283:10497512:U32_MAX
[  678.732625] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 9%, done 7137/73831 nodes, at extents:576460752303425605:12567040:U32_MAX
[  688.736635] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 12%, done 9038/73831 nodes, at extents:576460752303425791:18599288:U32_MAX
[  698.742695] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 14%, done 10944/73831 nodes, at extents:576460752303425964:3570048:U32_MAX
[  708.743588] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 17%, done 12846/73831 nodes, at extents:576460752303427420:5544:U32_MAX
[  718.743457] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 20%, done 14786/73831 nodes, at extents:576460752303434210:2314824:U32_MAX
[  728.742948] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 22%, done 16731/73831 nodes, at extents:576460752303434578:9631232:U32_MAX
[  738.742953] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 25%, done 18566/73831 nodes, at extents:576460752303434933:212680:U32_MAX
[  748.742300] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 27%, done 20656/73831 nodes, at extents:1152921504606849948:20110208:U32_MAX
[  758.744858] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 30%, done 22717/73831 nodes, at extents:1152921504606850189:12819072:U32_MAX
[  768.744519] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 33%, done 24652/73831 nodes, at extents:1152921504606850542:9441792:U32_MAX
[  778.752404] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 36%, done 26679/73831 nodes, at extents:1152921504606858822:4630400:U32_MAX
[  788.752911] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 38%, done 28528/73831 nodes, at extents:1152921504606859215:9618816:U32_MAX
[  798.753378] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 41%, done 30408/73831 nodes, at extents:1152921504606859586:7144296:U32_MAX
[  808.767230] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 43%, done 32263/73831 nodes, at extents:1152921504606859916:905520:U32_MAX
[  818.770045] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 46%, done 34218/73831 nodes, at extents:1729382256910270542:13878656:U32_MAX
[  828.769320] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 49%, done 36320/73831 nodes, at extents:2305843009213694899:127992:U32_MAX
[  838.770942] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 51%, done 38257/73831 nodes, at extents:3458764513820541863:8665984:U32_MAX
[  848.770081] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 54%, done 40386/73831 nodes, at extents:3458764513820542070:20972800:U32_MAX
[  858.769300] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 57%, done 42273/73831 nodes, at extents:3458764513820546452:1373184:U32_MAX
[  868.768946] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 59%, done 44140/73831 nodes, at extents:3458764513820546813:2441856:U32_MAX
[  878.776635] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 62%, done 46239/73831 nodes, at extents:4035225266123967289:11709824:U32_MAX
[  888.776230] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 65%, done 48346/73831 nodes, at extents:4035225266123967473:977920:U32_MAX
[  898.779421] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 68%, done 50231/73831 nodes, at extents:4035225266123967726:988928:U32_MAX
[  908.783846] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 70%, done 52133/73831 nodes, at extents:4035225266123974249:3202560:U32_MAX
[  918.786087] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 73%, done 54049/73831 nodes, at extents:4035225266123974643:4366248:U32_MAX
[  928.786130] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 75%, done 55874/73831 nodes, at extents:4035225266123974971:8881536:U32_MAX
[  938.787092] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 78%, done 57852/73831 nodes, at extents:4611686018427391487:4593656:U32_MAX
[  948.786523] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 80%, done 59784/73831 nodes, at extents:4611686018427391729:2018560:U32_MAX
[  958.788997] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 83%, done 61684/73831 nodes, at extents:4611686018427391929:21917312:U32_MAX
[  968.788688] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 86%, done 63525/73831 nodes, at extents:4611686018427398554:465792:U32_MAX
[  978.787885] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 88%, done 65399/73831 nodes, at extents:4611686018427399026:1241984:U32_MAX
[  988.794300] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 91%, done 67290/73831 nodes, at extents:4611686018427399456:879488:U32_MAX
[  998.794610] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 93%, done 69159/73831 nodes, at extents:4611686018427399755:2926592:U32_MAX
[ 1008.794572] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 96%, done 71026/73831 nodes, at extents:4611686018427400008:846336:U32_MAX
[ 1018.800695] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping user data 98%, done 73083/73831 nodes, at extents:6341068275337658399:4321408:U32_MAX
[ 1032.229083] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 1152%, done 2052/178 nodes, at extents:2147484540:4466304:U32_MAX
[ 1042.230666] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 2264%, done 4030/178 nodes, at extents:2147489042:2231168:U32_MAX
[ 1052.235897] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 3376%, done 6010/178 nodes, at extents:2147489400:2964608:U32_MAX
[ 1062.238329] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 4542%, done 8086/178 nodes, at extents:576460752303425687:9957264:U32_MAX
[ 1072.240549] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 5666%, done 10086/178 nodes, at extents:576460752303425872:18143616:U32_MAX
[ 1082.245585] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 6784%, done 12076/178 nodes, at extents:576460752303426051:30096256:U32_MAX
[ 1092.253210] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 7903%, done 14069/178 nodes, at extents:576460752303434015:5909896:U32_MAX
[ 1102.256742] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 9023%, done 16061/178 nodes, at extents:576460752303434483:1587968:U32_MAX
[ 1112.259316] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 10119%, done 18013/178 nodes, at extents:576460752303434848:413184:U32_MAX
[ 1122.259864] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 11244%, done 20015/178 nodes, at extents:1152921504606849839:8740992:U32_MAX
[ 1132.268941] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 12385%, done 22046/178 nodes, at extents:1152921504606850117:3038784:U32_MAX
[ 1142.270647] bcachefs (1d9598e2-e6e8-4b93-9042-6be52775de5f): dropping metadata 13295%, done 23666/178 nodes, at extents:1152921504606850446:18966784:U32_MAX

The metadata drop has seemed to stop at 13295% but the command is still hung.

I'm on ArchLinux w/ bcachefs-dksm 1.33.3-1 and bcachefs-tools 1.33.3-1, kernel 6.18.2-arch2-1.

Now fs usage shows:

bcachefs fs usage /mnt/big_raid -h
Filesystem: 1d9598e2-e6e8-4b93-9042-6be52775de5f
Size:                          47.7T
Used:                          25.9T
Online reserved:                   0

Data by durability desired and amount degraded:
          undegraded
2x:            25.9T

Device label                   Device      State          Size      Used  Use%
hdd.hdd1 (device 0):           sda         rw            25.4T     12.9T   50%
hdd.hdd2 (device 1):           sdb         rw            25.4T     12.9T   50%
nvme.nvme0 (device 2):         nvme1n1     evacuating     953G     7.45G   00%

show-super has a bug, where the first device name and model for one of my HDDs is actually reporting from the NVMe (and the NVMe is not reporting an name/model):

bcachefs show-super /dev/sda
External UUID:                             1d9598e2-e6e8-4b93-9042-6be52775de5f
Internal UUID:                             0a47704d-6347-4039-ab67-d3e01b11192c
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              0
Label:                                     (none)
Version:                                   reconcile (1.33)
Incompatible features allowed:             reconcile (1.33)
Incompatible features in use:              reconcile (1.33)
Version upgrade complete:                  reconcile (1.33)
Oldest version on disk:                    inode_has_case_insensitive (1.28)
Created:                                   Fri Dec  5 12:37:31 2025
Sequence number:                           389
Time of last write:                        Tue Dec 30 10:22:33 2025
Superblock size:                           7.25k/1.00M
Clean:                                     0
Devices:                                   3
Sections:                                  replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade,recovery_passes,extent_type_u64s
Features:                                  journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes,incompat_version_field
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00k
  btree_node_size:                         256k
  errors:                                  continue [fix_safe] panic ro
  write_error_timeout:                     30
  metadata_replicas:                       2
  data_replicas:                           2
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0k
  metadata_checksum:                       none [crc32c] crc64 xxhash
  data_checksum:                           none [crc32c] crc64 xxhash
  checksum_err_retry_nr:                   3
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash]
  metadata_target:                         none
  foreground_target:                       none
  background_target:                       none
  promote_target:                          none
  erasure_code:                            0
  casefold:                                0
  inodes_32bit:                            0
  shard_inode_numbers_bits:                4
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  degraded:                                [ask] yes very no
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  writeback_timeout:                       0
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         compatible [incompatible] none
  nocow:                                   0
  reconcile_on_ac_only:                    0

errors (size 8):

ext (size 96):
Recovery passes required:                  btree_bitmap_gc
Errors to silently fix:
Btrees with missing data:
Device 0:                                  /dev/sda        ST28000NM000C-3W
  Label:                                   hdd.hdd1
  UUID:                                    f60feec4-1662-4f3d-87c4-61403aa522a0
  Size:                                    25.4T
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             2.00M
  First bucket:                            0
  Buckets:                                 13351936
  Last mount:                              Tue Dec 30 10:11:55 2025
  Last superblock write:                   389
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Rotational:                              1
  Btree allocated bitmap blocksize:        512M
  Btree allocated bitmap:                  0000000000000000000000000000001000000000000000000001000000010011
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1
  Resize on mount:                         0
  Last device name:                        nvme1n1
  Last device model:                       ADATA SX8200PNP
Device 1:                                  /dev/sdb        ST28000NM000C-3W
  Label:                                   hdd.hdd2
  UUID:                                    37cd679f-f970-4477-8ff2-1cf9687032f7
  Size:                                    25.4T
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             2.00M
  First bucket:                            0
  Buckets:                                 13351936
  Last mount:                              Tue Dec 30 10:11:55 2025
  Last superblock write:                   389
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Rotational:                              1
  Btree allocated bitmap blocksize:        512M
  Btree allocated bitmap:                  0000000000000000000000000000001000000000000000000001000000010011
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1
  Resize on mount:                         0
  Last device name:                        sdb
  Last device model:                       ST28000NM000C-3W
Device 2:                                  /dev/nvme1n1    ADATA SX8200PNP
  Label:                                   nvme.nvme0
  UUID:                                    01c0ad7a-3655-43ed-88cf-4f822e7bab84
  Size:                                    953G
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             1.00M
  First bucket:                            0
  Buckets:                                 976762
  Last mount:                              Tue Dec 30 10:17:45 2025
  Last superblock write:                   389
  State:                                   evacuating
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Rotational:                              0
  Btree allocated bitmap blocksize:        64.0k
  Btree allocated bitmap:                  0000000000000000100000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1
  Resize on mount:                         0
  Last device name:
  Last device model:

reconcile status:

bcachefs reconcile status /mnt/big_raid
Scan pending:                  0
                                        data    metadata
  replicas:                                0           0
  checksum:                                0           0
  erasure_code:                            0           0
  compression:                             0           0
  target:                                  0           0
  high_priority:                           0           0
  pending:                                 0           0

fs top is showing a lot of error_throw:

All counters have a corresponding tracepoint; for more info on any given event, try e.g.
  perf trace -e bcachefs:data_update_pred

                                                    1s         total         mount
sync_fs                                          0/sec             0             4
data_read                                       0B/sec            0B      5910528B
data_read_bounce                                 0/sec             0            14
reconcile_btree                                 0B/sec            0B       524288B
bucket_discard_worker                            0/sec             0            91
bucket_discard                                   0/sec             0            36
bucket_alloc                                     0/sec             0          7647
bucket_alloc_fail                                0/sec             0            15
btree_cache_scan                                 0/sec             0           435
btree_cache_reap                               861/sec         17176        809963
btree_cache_cannibalize                        861/sec         17176        754485
btree_cache_cannibalize_lock                   862/sec         17177        755663
btree_cache_cannibalize_unlock                   0/sec             0          1176
btree_node_write                                 0/sec             0          1481
btree_node_read                                861/sec         17176        851515
btree_node_compact                               0/sec             0            46
btree_node_merge_attempt                         0/sec             0            78
btree_node_split                                 0/sec             0             5
btree_node_rewrite                               0/sec             0             2
btree_node_alloc                                 0/sec             0            58
btree_node_free                                  0/sec             0           108
btree_node_set_root                              0/sec             0             5
btree_key_cache_fill                             0/sec             0         36459
btree_path_relock_fail                         431/sec          8588        376709
btree_reserve_get_fail                           0/sec             0             2
journal_reclaim_finish                          10/sec           192         21394
journal_reclaim_start                           10/sec           192         21394
journal_write                                    0/sec             0           216
trans_restart_btree_node_split                   0/sec             0             4
trans_restart_mem_realloced                      0/sec             0            29
trans_restart_memory_allocation_failure          0/sec             0          1096
trans_restart_relock                             0/sec             0             7
trans_restart_relock_path                      862/sec         17177        926208
trans_traverse_all                               0/sec             0         98111
transaction_commit                               0/sec             0         10729
write_super                                      0/sec             0            51
write_buffer_flush                               0/sec             0            31
write_buffer_flush_sync                          0/sec             0            30
accounting_key_to_wb_slowpath                    0/sec             0           110
error_throw                                   1724/sec         34354       1513222

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions