Describe the bug
I encountered a crash when running FEMU in black-box FDP mode under heavy write / GC pressure. The qemu-system-x86_64 process crashed with SIGSEGV in the FDP GC path.
The crash happens in select_victim_ru() when FDP GC falls back to selecting a victim RU from full_ru_list and removes it with QTAILQ_REMOVE().
I compared my local ftl.c with the upstream version. The only meaningful local changes are that I disabled several FDP_TRACE logs to reduce log size. The FDP GC logic around fdp_advance_ru_pointer(), do_gc_fdp_style(), select_victim_ru(), and full_ru_list handling appears unchanged.
Environment
- Host OS: Ubuntu 22.04
- Kernel version: 6.8.0-107-generic
- FEMU version/commit: c966d34
- FEMU mode: BlackBox SSD with FDP enabled
- FDP configuration:
fdp=on
fdp.nruh=8
fdp.nrg=1
fdp.nru=256
- Device size:
12288 MB
- Guest OS/image: Ubuntu 24.04 qcow2 image
To Reproduce
Steps to reproduce the behavior:
-
Use the upstream run-blackbox-fdp.sh script from commit c966d341a13795ef917702756c6fd727aeb2bbef.
-
Start FEMU with the following command:
stdbuf -oL -eL ./run-blackbox-fdp.sh 2>&1 | tee ~/femu-fdp-$(date +%F-%H%M%S).log
-
The script starts FEMU in black-box SSD mode with FDP enabled. The effective QEMU command line shown in the coredump includes the following key options:
fdp=on
fdp.nruh=8
fdp.nrg=1
fdp.nru=256
devsz_mb=12288
femu_mode=1
secsz=512
secs_per_pg=8
pgs_per_blk=256
blks_per_pl=256
pls_per_lun=1
luns_per_ch=8
nchs=8
gc_thres_pcent=50
gc_thres_pcent_high=75
-
Run a heavy write workload inside the guest so that the FDP device reaches high GC pressure / RU exhaustion.
-
The host-side qemu-system-x86_64 process crashes with SIGSEGV.
Expected behavior
FEMU should not crash when the FDP device is under high write or GC pressure.
Even if there are no free RUs available, the FDP GC path should handle the situation gracefully, for example by returning an error, stalling/retrying the write path, or reporting device-full / no-free-RU conditions, rather than causing a segmentation fault in the QEMU process.
Error logs
The coredump shows that qemu-system-x86_64 crashed with SIGSEGV:
sudo coredumpctl list | tail -n 50
TIME PID UID GID SIG COREFILE EXE SIZE
Mon 2026-05-04 19:30:06 CST 1904205 0 0 SIGSEGV truncated /home/dell/femu-work/FEMU/build-femu/qemu-system-x86_64 17.3M
The GDB backtrace points to select_victim_ru():
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055c5ed859ee1 in select_victim_ru (
force=false,
ruhid=<error reading variable: Cannot access memory at address 0x7326a9cfc6e0>,
rgid=<error reading variable: Cannot access memory at address 0x7326a9cfc728>,
ssd=0x55c60e0ebce0
) at ../hw/femu/bbssd/ftl.c:1639
1639 QTAILQ_REMOVE(&rm->full_ru_list, cand, entry);
(gdb) bt
#0 0x000055c5ed859ee1 in select_victim_ru (
force=false,
ruhid=<error reading variable: Cannot access memory at address 0x7326a9cfc6e0>,
rgid=<error reading variable: Cannot access memory at address 0x7326a9cfc728>,
ssd=0x55c60e0ebce0
) at ../hw/femu/bbssd/ftl.c:1639
#1 do_gc_fdp_style (
ssd=0x55c60e0ebce0,
rgid=<error reading variable: Cannot access memory at address 0x7326a9cfc728>,
ruhid=<optimized out>,
force=<optimized out>
) at ../hw/femu/bbssd/ftl.c:1818
Backtrace stopped: Cannot access memory at address 0x7326a9cfc798
The relevant code path is:
if (!victim_ru) {
FemuReclaimUnit *cand;
QTAILQ_FOREACH(cand, &rm->full_ru_list, entry) {
bool is_active = false;
for (uint16_t ri = 0; ri < (uint16_t)ssd->nruhs; ri++) {
if (ssd->ruhs[ri].curr_ru == cand ||
ssd->ruhs[ri].gc_ru == cand) {
is_active = true;
break;
}
}
if (!is_active) {
victim_ru = cand;
QTAILQ_REMOVE(&rm->full_ru_list, cand, entry);
rm->full_ru_cnt--;
break;
}
}
}
Describe the bug
I encountered a crash when running FEMU in black-box FDP mode under heavy write / GC pressure. The
qemu-system-x86_64process crashed withSIGSEGVin the FDP GC path.The crash happens in
select_victim_ru()when FDP GC falls back to selecting a victim RU fromfull_ru_listand removes it withQTAILQ_REMOVE().I compared my local
ftl.cwith the upstream version. The only meaningful local changes are that I disabled severalFDP_TRACElogs to reduce log size. The FDP GC logic aroundfdp_advance_ru_pointer(),do_gc_fdp_style(),select_victim_ru(), andfull_ru_listhandling appears unchanged.Environment
fdp=onfdp.nruh=8fdp.nrg=1fdp.nru=25612288 MBTo Reproduce
Steps to reproduce the behavior:
Use the upstream
run-blackbox-fdp.shscript from commitc966d341a13795ef917702756c6fd727aeb2bbef.Start FEMU with the following command:
The script starts FEMU in black-box SSD mode with FDP enabled. The effective QEMU command line shown in the coredump includes the following key options:
Run a heavy write workload inside the guest so that the FDP device reaches high GC pressure / RU exhaustion.
The host-side qemu-system-x86_64 process crashes with SIGSEGV.
Expected behavior
FEMU should not crash when the FDP device is under high write or GC pressure.
Even if there are no free RUs available, the FDP GC path should handle the situation gracefully, for example by returning an error, stalling/retrying the write path, or reporting device-full / no-free-RU conditions, rather than causing a segmentation fault in the QEMU process.
Error logs
The coredump shows that
qemu-system-x86_64crashed withSIGSEGV:The GDB backtrace points to
select_victim_ru():The relevant code path is: