FDP GC crash in select_victim_ru() when removing RU from full_ru_list

**Describe the bug**
I encountered a crash when running FEMU in black-box FDP mode under heavy write / GC pressure. The `qemu-system-x86_64` process crashed with `SIGSEGV` in the FDP GC path.

The crash happens in `select_victim_ru()` when FDP GC falls back to selecting a victim RU from `full_ru_list` and removes it with `QTAILQ_REMOVE()`.

I compared my local `ftl.c` with the upstream version. The only meaningful local changes are that I disabled several `FDP_TRACE` logs to reduce log size. The FDP GC logic around `fdp_advance_ru_pointer()`, `do_gc_fdp_style()`, `select_victim_ru()`, and `full_ru_list` handling appears unchanged.

**Environment**
- Host OS: Ubuntu 22.04
- Kernel version: 6.8.0-107-generic
- FEMU version/commit: c966d341a13795ef917702756c6fd727aeb2bbef
- FEMU mode: BlackBox SSD with FDP enabled
- FDP configuration:
  - `fdp=on`
  - `fdp.nruh=8`
  - `fdp.nrg=1`
  - `fdp.nru=256`
- Device size: `12288 MB`
- Guest OS/image: Ubuntu 24.04 qcow2 image

**To Reproduce**
Steps to reproduce the behavior:

1. Use the upstream `run-blackbox-fdp.sh` script from commit `c966d341a13795ef917702756c6fd727aeb2bbef`.

2. Start FEMU with the following command:

   ```bash
   stdbuf -oL -eL ./run-blackbox-fdp.sh 2>&1 | tee ~/femu-fdp-$(date +%F-%H%M%S).log
   ```

3. The script starts FEMU in black-box SSD mode with FDP enabled. The effective QEMU command line shown in the coredump includes the following key options:

```text
fdp=on
fdp.nruh=8
fdp.nrg=1
fdp.nru=256

devsz_mb=12288
femu_mode=1
secsz=512
secs_per_pg=8
pgs_per_blk=256
blks_per_pl=256
pls_per_lun=1
luns_per_ch=8
nchs=8

gc_thres_pcent=50
gc_thres_pcent_high=75
```

4.  Run a heavy write workload inside the guest so that the FDP device reaches high GC pressure / RU exhaustion.

5. The host-side qemu-system-x86_64 process crashes with SIGSEGV.

**Expected behavior**
FEMU should not crash when the FDP device is under high write or GC pressure.

Even if there are no free RUs available, the FDP GC path should handle the situation gracefully, for example by returning an error, stalling/retrying the write path, or reporting device-full / no-free-RU conditions, rather than causing a segmentation fault in the QEMU process.

**Error logs**
The coredump shows that `qemu-system-x86_64` crashed with `SIGSEGV`:

```text
sudo coredumpctl list | tail -n 50

TIME                            PID UID GID SIG     COREFILE  EXE                                                      SIZE
Mon 2026-05-04 19:30:06 CST 1904205   0   0 SIGSEGV truncated /home/dell/femu-work/FEMU/build-femu/qemu-system-x86_64 17.3M
```

The GDB backtrace points to `select_victim_ru()`:

```text
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055c5ed859ee1 in select_victim_ru (
    force=false,
    ruhid=<error reading variable: Cannot access memory at address 0x7326a9cfc6e0>,
    rgid=<error reading variable: Cannot access memory at address 0x7326a9cfc728>,
    ssd=0x55c60e0ebce0
) at ../hw/femu/bbssd/ftl.c:1639

1639                    QTAILQ_REMOVE(&rm->full_ru_list, cand, entry);

(gdb) bt
#0  0x000055c5ed859ee1 in select_victim_ru (
    force=false,
    ruhid=<error reading variable: Cannot access memory at address 0x7326a9cfc6e0>,
    rgid=<error reading variable: Cannot access memory at address 0x7326a9cfc728>,
    ssd=0x55c60e0ebce0
) at ../hw/femu/bbssd/ftl.c:1639

#1  do_gc_fdp_style (
    ssd=0x55c60e0ebce0,
    rgid=<error reading variable: Cannot access memory at address 0x7326a9cfc728>,
    ruhid=<optimized out>,
    force=<optimized out>
) at ../hw/femu/bbssd/ftl.c:1818

Backtrace stopped: Cannot access memory at address 0x7326a9cfc798
```

The relevant code path is:

```c
if (!victim_ru) {
    FemuReclaimUnit *cand;
    QTAILQ_FOREACH(cand, &rm->full_ru_list, entry) {
        bool is_active = false;
        for (uint16_t ri = 0; ri < (uint16_t)ssd->nruhs; ri++) {
            if (ssd->ruhs[ri].curr_ru == cand ||
                ssd->ruhs[ri].gc_ru == cand) {
                is_active = true;
                break;
            }
        }
        if (!is_active) {
            victim_ru = cand;
            QTAILQ_REMOVE(&rm->full_ru_list, cand, entry);
            rm->full_ru_cnt--;
            break;
        }
    }
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FDP GC crash in select_victim_ru() when removing RU from full_ru_list #186

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FDP GC crash in select_victim_ru() when removing RU from full_ru_list #186

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions