fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models by rebel-eunji · Pull Request #593 · RBLN-SW/vllm-rbln

rebel-eunji · 2026-05-07T14:39:00Z

🚀 Summary of Changes

Problem

When running whisper models through vLLM serve , and a client POSTs to /v1/audio/transcriptions, vLLM raises the following error:

torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors:
call_method masked_fill_(*(FakeTensor(..., size=(1, 51966)),
                           FakeTensor(..., size=(1, 51866), dtype=torch.bool), -inf), **{}):
got RuntimeError('Attempting to broadcast a dimension of length 51866 at -1!
                  Mismatching argument at index 1 had torch.Size([1, 51866]);
                  but expected shape should be broadcastable to [1, 51966]')

from user code:
  File ".../vllm_rbln/v1/sample/rbln_sampler.py", line 275, in forward
    logits = self.apply_logits_processors(...)
  File ".../vllm/v1/sample/sampler.py", line 288, in apply_logits_processors
    logits.masked_fill_(sampling_metadata.allowed_token_ids_mask, float("-inf"))

Solution

Replace the INVALID_TOKEN sentinel trick in Whisper prefill: now run the encoder and the first decoder step with
decoder_start_token_id to return real first-token logits to the sampler.

📌 Related Issues / Tickets

Resolves #
Related to #

✅ Type of Change

🚀 Release (release)
✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

Run ...
Verify output: ...
Edge case tested: ...

📸 Screenshots / Logs (if applicable)

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

codecov · 2026-05-07T16:41:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

fix: run both encoder and decoder in prefill step

42ad5b2

rebel-eunji changed the title ~~fix: run both encoder and decoder in prefill step~~ fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models May 7, 2026

rebel-eunji self-assigned this May 7, 2026

rebel-eunji added bug Something isn't working optimum Optimum based implmenetion labels May 7, 2026

rebel-eunji added 2 commits May 8, 2026 00:12

refactor

b35b22f

fix: shape

1e947e7

rebel-eunji changed the title ~~fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models~~ fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models (WIP) May 7, 2026

rebel-eunji marked this pull request as draft May 7, 2026 16:14

fix decoder_start_token_id

e969871

refactor whisper model code

a5445a2

rebel-eunji changed the title ~~fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models (WIP)~~ fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models May 8, 2026

rebel-eunji requested a review from rebel-jonghewk May 8, 2026 01:50

Merge branch 'dev' into fix/whisper

2d35f33

rebel-eunji marked this pull request as ready for review May 8, 2026 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models#593

fix(model): replace INVALID_TOKEN sentinel with real first-token decode in prefill for whisper models#593
rebel-eunji wants to merge 6 commits intodevfrom
fix/whisper

rebel-eunji commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rebel-eunji commented May 7, 2026

🚀 Summary of Changes

Problem

Solution

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📸 Screenshots / Logs (if applicable)

📋 Checklist

💬 Notes

Uh oh!

codecov Bot commented May 7, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant