Low performance (high effective IO latency) with io_scheduler and no/short polling or --overprovisioned

## Summary

When using `--overprovisioned` or `--idle-poll-time-us=1` and a using non-default io-properties appropriate for the host, single-shard I/O performance drops to ~2.7K IOPS compared to ~12K IOPS with baseline configuration (no `--overprovisioned`). This represents a roughly 4-5x performance reduction. The issue appears to be caused by the reactor sleeping longer than "necessary" when waiting for token bucket capacity: specifically because the per grab amount is large, and we wait for the full amount.

## Environment

- **Seastar version**: Current master (commit bbd0001a)
- **Test workload**: Random 4KB reads (or writes, behavior is the same), iodepth=1
- **Storage**: NVMe SSD on XFS
- **Disk capabilities** (measured by iotune):
  - Random read IOPS: 268,337 IOPS

## io-properties Configuration

```yaml
disks:
  - mountpoint: /mnt/xfs
    read_iops: 268337
    read_bandwidth: 1259085440
    write_iops: 134175
    write_bandwidth: 604742528
```

## Test Results

### Baseline (no --overprovisioned)
```bash
$ build/release/apps/io_tester/io_tester --io-properties-file ~/io-props.yaml \
    --conf ~/io2.yaml --storage /mnt/xfs/io_tester --duration=5 -c1
Job highprio -> sched class highprio
    IOPS: 12126.1523
```

Result: ~12,126 IOPS

### With --overprovisioned
```bash
$ build/release/apps/io_tester/io_tester --io-properties-file ~/io-props.yaml \
    --conf ~/io2.yaml --storage /mnt/xfs/io_tester --duration=5 -c1 --overprovisioned
Job highprio -> sched class highprio
    IOPS: 2676.50513
```

Result: ~2,677 IOPS (about 4.5x slower)

### With --idle-poll-time-us=1 (same behavior)
```bash
$ build/release/apps/io_tester/io_tester --io-properties-file ~/io-props.yaml \
    --conf ~/io2.yaml --storage /mnt/xfs/io_tester --duration=5 -c1 --idle-poll-time-us=1
Job highprio -> sched class highprio
    IOPS: 2670.78101
```

Result: ~2,671 IOPS (same degradation as --overprovisioned, confirming the issue is caused by reduced polling frequency)

### Without io-properties (control)
```bash
$ build/release/apps/io_tester/io_tester \
    --conf ~/io2.yaml --storage /mnt/xfs/io_tester --duration=5 -c1 --overprovisioned
Job highprio -> sched class highprio
    IOPS: 12289.8516
```

Result: ~12,290 IOPS (similar to baseline, no degradation without io-properties)

## Root Cause Analysis

The issue appears to be caused by two factors:

### Issue 1: Sleep Time Based on Full Deficiency, Not I/O Need

The sleep duration is calculated based on replenishing the full token deficiency (the gap between pending reservation and bucket head), not just the tokens needed for the next I/O.

From `ioinfo -c1 --directory /mnt/xfs`:
```
fair_queue:
  capacities:
    4096:
      read: 117101    # tokens needed for 4KB read
  per_tick_grab_threshold: 12582912
  token_bucket:
    rate: 16777216    # tokens per millisecond
```

For a 4KB read:
- Tokens needed: 117,101 tokens
- Time to replenish 1 I/O: ~7 us

But the token bucket reserves in large chunks:
- per_tick_grab_threshold: 12,582,912 tokens
- Time to replenish full reservation: ~750 us

This means the system may sleep considerably longer than needed to dispatch a single I/O.

### Issue 2: Tokens Only Added During Polling

Token bucket replenishment is not passive — tokens are only added when `maybe_replenish_capacity()` is called, which only happens from `poll_io_queue()`.

1. With `--overprovisioned`, `max_poll_time=0us` causes the reactor to sleep immediately when idle
2. While sleeping, no polling occurs, so no tokens are added to the bucket
3. The reactor sleeps for the full calculated deficiency time

### The Flow

1. I/O completes at time T=0
2. io-queue checks for next dispatch — not enough tokens available
3. `next_pending_aio()` calculates delay based on the full deficiency, which is typically close to `per_tick_grab_threshold` (12.5M tokens, ~750us) since that's the reservation size used in `grab_capacity()`
4. Timer armed for T+delay, reactor goes to sleep
5. Reactor sleeps for full delay (even though only ~7us worth of tokens are needed for the next I/O)
6. Timer fires, reactor wakes
7. `poll_io_queue()` called, which calls `maybe_replenish_capacity()`
8. Tokens are now added (for the full elapsed time)
9. Next I/O dispatches

The result is that I/O dispatch frequency is limited by the sleep duration rather than the actual token replenishment time needed.

### Why Baseline Works

With baseline configuration:
- `max_poll_time=200us` means the reactor actively polls before sleeping
- `poll_io_queue()` is called frequently
- `maybe_replenish_capacity()` runs every few microseconds
- Tokens appear in the bucket almost as soon as time passes

With `--overprovisioned`:
- `max_poll_time=0us` means no active polling
- Tokens only appear when the reactor wakes from sleep
- Sleep duration becomes the limiting factor for throughput

### Evidence from Logging

**Baseline replenishment pattern:**
```
io_throttler::maybe_replenish: elapsed=0us, extra=10335, REPLENISHING
io_throttler::maybe_replenish: elapsed=0us, extra=7650, REPLENISHING
io_throttler::maybe_replenish: elapsed=1us, extra=26055, REPLENISHING
```
Tokens replenished every 0-1 microseconds.

**Overprovisioned replenishment pattern:**
```
io_throttler::maybe_replenish: elapsed=543us, extra=9124541, REPLENISHING
io_throttler::maybe_replenish: elapsed=431us, extra=7236634, REPLENISHING
io_throttler::maybe_replenish: elapsed=589us, extra=9894866, REPLENISHING
```
Tokens only replenished every 400-600 microseconds (when reactor wakes).

## Proposed Solutions

Some of the solutions proposed for the multi-shard io-properties issue (see #3201) may also apply here, particularly those related to passive token replenishment or changes to how sleep duration is calculated.

## Impact

This issue affects workloads that:
- Use `--overprovisioned` mode (common in containerized/virtualized environments)
- Have io-properties configured for disk throttling
- Have low iodepth where token bucket throttling dominates

The performance reduction may make `--overprovisioned` less suitable for use with io-properties in latency-sensitive scenarios.

## Reproducibility

100% reproducible with the test configurations provided.

### Steps to Reproduce

1. Create io-properties file (`~/io-props.yaml`):
```yaml
disks:
  - mountpoint: /mnt/xfs
    read_iops: 268337
    read_bandwidth: 1259085440
    write_iops: 134175
    write_bandwidth: 604742528
```

2. Create test config (`~/io2.yaml`):
```yaml
- name: highprio
  shards: [0]
  type: randread
  shard_info:
    parallelism: 1
    reqsize: 4kB
    shares: 1000
    think_time: 0
```

3. Build Seastar:
```bash
./configure.py --mode=release
ninja -C build/release apps/io_tester/io_tester
```

4. Run baseline test:
```bash
build/release/apps/io_tester/io_tester \
    --io-properties-file ~/io-props.yaml \
    --conf ~/io2.yaml \
    --storage /mnt/xfs/io_tester \
    --duration=5 -c1
```

5. Run overprovisioned test:
```bash
build/release/apps/io_tester/io_tester \
    --io-properties-file ~/io-props.yaml \
    --conf ~/io2.yaml \
    --storage /mnt/xfs/io_tester \
    --duration=5 -c1 --overprovisioned
```

6. Compare results: overprovisioned should show noticeably lower IOPS.

## Related Issues

This issue is related to but distinct from the multi-shard io-properties issue (#3201). Both issues stem from the token bucket design in the sense they are related to the grab granularity but have different root causes:

- **Multi-shard issue**: Token loss when `ready_tokens` are discarded on empty queue
- **Overprovisioned issue**: Tokens not replenished without active polling


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low performance (high effective IO latency) with io_scheduler and no/short polling or --overprovisioned #3202

Summary

Environment

io-properties Configuration

Test Results

Baseline (no --overprovisioned)

With --overprovisioned

With --idle-poll-time-us=1 (same behavior)

Without io-properties (control)

Root Cause Analysis

Issue 1: Sleep Time Based on Full Deficiency, Not I/O Need

Issue 2: Tokens Only Added During Polling

The Flow

Why Baseline Works

Evidence from Logging

Proposed Solutions

Impact

Reproducibility

Steps to Reproduce

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low performance (high effective IO latency) with io_scheduler and no/short polling or --overprovisioned #3202

Description

Summary

Environment

io-properties Configuration

Test Results

Baseline (no --overprovisioned)

With --overprovisioned

With --idle-poll-time-us=1 (same behavior)

Without io-properties (control)

Root Cause Analysis

Issue 1: Sleep Time Based on Full Deficiency, Not I/O Need

Issue 2: Tokens Only Added During Polling

The Flow

Why Baseline Works

Evidence from Logging

Proposed Solutions

Impact

Reproducibility

Steps to Reproduce

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions