feat: expose condition_on_previous_text parameter to prevent cascading degeneration in long audio

## Problem

Long audio recordings (>30s) with natural speech pauses produce degenerate output:
Repeated commas, truncated transcription, or hallucinated text. This is closely related to #134.

### Root Cause

In non-batched mode, `faster-whisper` concatenates all VAD speech segments into a single audio blob and processes it in 30-second windows (`WhisperModel.transcribe()`, line `audio = np.concatenate(audio_chunks, axis=0)`). With `condition_on_previous_text=True` (the faster-whisper default), degenerate output from one window is fed as context into the next, causing a cascade failure that corrupts all subsequent transcription.

This is a known faster-whisper issue (see [SYSTRAN/faster-whisper#843](https://github.com/SYSTRAN/faster-whisper/issues/843)), and it is unlikely to be fixed upstream soon. Speaches can mitigate this **today** by exposing the `condition_on_previous_text` parameter that faster-whisper already supports but Speaches does not pass through.

The problem is especially pronounced with:
- Technical speech containing numbers and domain terms (e.g., "14,4 kV")
- Recordings longer than 30 seconds with natural pauses
- German and other non-English languages

### Reproduction

Using model `TheTobyB/whisper-large-v3-turbo-german-ct2` with German audio:

1. Record ~70 seconds of technical speech with natural pauses between sentences
2. Send to the API:
   ```bash
   curl -X POST "http://localhost:8000/v1/audio/transcriptions" \
     -F "file=@recording.wav" \
     -F "language=de" \
     -F "model=TheTobyB/whisper-large-v3-turbo-german-ct2"
   ```
3. Transcription cuts off after ~30s, ending with degenerate output like ",,"

Logs show VAD keeps nearly the entire audio as one segment, and Whisper processes it in 3 x 30s windows:
```
Processing audio with duration 01:10.490
VAD filter removed 00:02.352 of audio
VAD filter kept the following audio segments: [00:02.352 -> 01:10.490]
Processing segment at 00:00.000
Processing segment at 00:30.000
Processing segment at 01:00.000
```

## Proposed Solution

Expose `condition_on_previous_text` as a configurable parameter — similar to how #457 made `vad_filter` configurable:

1. Add `condition_on_previous_text` to config or as a `Form()` parameter on the STT endpoints
2. Pass it through to `whisper_model.transcribe(..., condition_on_previous_text=...)`

Setting it to `False` makes each 30s window independent, so degeneration in one window stays isolated and subsequent windows recover. This is the minimal, tested fix.

**Tradeoff**: Slight loss of cross-window context (a sentence spanning a 30s boundary loses prior context). In practice this is negligible compared to losing the entire second half of a transcription.

## Workarounds

Until this is configurable, two workarounds exist:

| Approach | How | Tradeoff |
|---|---|---|
| `condition_on_previous_text=False` | Volume-mount patched `stt.py` | Minimal accuracy loss at window boundaries |
| `WHISPER__USE_BATCHED_MODE=true` | Config change, no code patch | More word errors on technical vocabulary (each VAD segment loses all prior context) |

Both produce complete transcriptions. `condition_on_previous_text=False` delivers better word accuracy on technical content.

## Environment

- speaches `latest-cpu` (digest `sha256:21e3df06d842...`)
- Model: `TheTobyB/whisper-large-v3-turbo-german-ct2`
- Non-batched mode (default)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose condition_on_previous_text parameter to prevent cascading degeneration in long audio #619

Problem

Root Cause

Reproduction

Proposed Solution

Workarounds

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	How	Tradeoff
`condition_on_previous_text=False`	Volume-mount patched `stt.py`	Minimal accuracy loss at window boundaries
`WHISPER__USE_BATCHED_MODE=true`	Config change, no code patch	More word errors on technical vocabulary (each VAD segment loses all prior context)

feat: expose condition_on_previous_text parameter to prevent cascading degeneration in long audio #619

Description

Problem

Root Cause

Reproduction

Proposed Solution

Workarounds

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions