fix(lora): sync with vLLM 0.18.0 and update LoRA tests by junstar92 · Pull Request #504 · RBLN-SW/vllm-rbln

junstar92 · 2026-04-02T04:38:30Z

🚀 Summary of Changes

This PR focuses on LoRA test coverage and validation for the RBLN torch-compile path.

Updated vllm_rbln/lora/layer.py to realign the RBLN LoRA embedding path with the current upstream implementation while preserving the RBLN-specific input-shape handling.
Added a basic LoRA E2E smoke test at tests/torch_compile/e2e/v1/lora/test_basic_lora.py using a single SQL prompt.
Updated the experimental LoRA example at examples/experimental/run_lora_test.py to use:
- meta-llama/Llama-3.2-3B-Instruct
- jeeejeee/llama32-3b-text2sql-spider
Applied the same runtime configuration in tests via monkey patching:
- VLLM_RBLN_ENFORCE_MODEL_FP32=1
- VLLM_RBLN_ENABLE_WARM_UP=0
- VLLM_RBLN_USE_VLLM_MODEL=1
- VLLM_DISABLE_COMPILE_CACHE=0
Relaxed TorchDynamo recompile limits in the LoRA unit tests to reduce excessive recompilation and remove fallbacks during test runs.

📌 Related Issues / Tickets

Resolves #
Related to #

✅ Type of Change

🚀 Release (release)
✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

Run unit tests: pytest tests/torch_compile/unit/v1/lora
Run the new E2E smoke test: pytest tests/torch_compile/e2e/v1/lora/test_basic_lora.py
Run the example script: python examples/experimental/run_lora_test.py

Verify output:

The basic E2E test should produce non-empty outputs for both base and LoRA generation, and the outputs should differ.
The example script should run with the updated model/LoRA pair.
Known local unit-test result on this branch: 25 failed, 111 passed.

Known failures:

test_embeddings fails in test_layers.py.
- The same math appears to pass in eager mode.
- The failure seems to happen only after torch.compile(..., backend="rbln").
- This suggests a compile/lowering issue rather than a pure PyTorch math issue.
test_lora_functions.py fails locally with:
- RuntimeError: RBLNRuntimeError: RBLN_DEVICES environment variable changed at runtime. Initial value: , Current value: 0
- This does not currently look like a test-logic issue.
- It may be environment-specific and may only be reproduciable on my side.

📸 Screenshots / Logs (if applicable)

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

The example still shows broken or corrupted output in compiled mode.
The same example behaves normally in eager mode, so this also appears to be a compile-specific issue.
The exact failure point in the compiled path is still unclear and has been difficult to debug from our side.
Review and investigation from the Rebellions compiler/runtime side is likely required.

…appings through execute model pipeline for ngram specdec

* fix bug related to sampler warm-up for pp Signed-off-by: wonsub kim <subang0@rebellions.ai> * apply formatting and add FIXME comment --------- Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-authored-by: wonsub kim <subang0@rebellions.ai>

…499) Problem: When a new prefill request is scheduled, the RBLN scheduler kicks out all running decode requests and restores the full token budget. However, num_new_tokens was clipped using the already-reduced token_budget before the kick-out, causing the first prefill chunk to be short by the number of tokens consumed by decode requests (e.g. 127 instead of 128). This off-by-one misaligned all subsequent chunk positions, eventually triggering a device runtime abort (SYS_TASK_ABORTED). Solution: Use prefill_token_budget instead of token_budget when computing num_new_tokens for new prefill requests.

…h tests (#491) Co-authored-by: Huijong JEONG <huijong.jeong@squeezebits.com>

rebel-jinhwan · 2026-04-02T04:57:13Z

https://github.com/rebellions-sw/fsw-inference/issues/144
https://github.com/rebellions-sw/fsw-inference/issues/165

Signed-off-by: Jinseok Lee <jindol21@rebellions.ai> Co-authored-by: Jaehwang Jung <jaehwang.jung@rebellions.ai>

Re-applies logic removed in 3495a41; pipeline parallel depends on it.

examples/experimental/run_lora_test.py

…e unit test directory

…naccuracies

rebel-jiwoopark

lgtm

rebel-jiwoopark · 2026-04-08T06:56:00Z

@rebel-jinhwan Could you check if any additional review is needed?

junstar92 and others added 15 commits March 24, 2026 04:26

chore: upgrade vllm dependency from 0.13.0 to 0.17.1

b405450

fix: bump RBLNScheduler to v0.17.1

fde3ba9

fix: introduce missing _get_slot_mappings helper and propagate slot_m…

3c6245d

…appings through execute model pipeline for ngram specdec

fix: return compilation time after warmup

676d4c1

fix: scheduler tests

0bd5d7f

chore: upgrade vllm dependency from 0.17.1 to 0.18.0

8b32c7f

fix: bump scheduler codes and tests to v0.18.0

12abb89

change pyproject.toml (#484)

1822471

docs: update readme (#486)

719127f

fix: update ci branch (#483)

d02ff99

other: ignore minor error in prefix cache example script (#500)

95d4353

refactor(test): Seperate unit test and e2e test for torch compile pat…

563c291

…h tests (#491) Co-authored-by: Huijong JEONG <huijong.jeong@squeezebits.com>

other(test): add structured decoding unit/e2e tests (#498)

d674306

junstar92 self-assigned this Apr 2, 2026

junstar92 added the torch.compile torch.compile based implementation label Apr 2, 2026

junstar92 requested review from rebel-jinhwan and rebel-jiwoopark April 2, 2026 04:38

rebel-jindol21 and others added 2 commits April 2, 2026 15:27

fix(kernel): fixed argument lists for swa in 0.18 (#502)

980504b

Signed-off-by: Jinseok Lee <jindol21@rebellions.ai> Co-authored-by: Jaehwang Jung <jaehwang.jung@rebellions.ai>

other(log): fix warning message related to env variables (#505)

e581f96

junstar92 force-pushed the fix-lora-test branch from d4a1e05 to 350efab Compare April 2, 2026 23:27

rebel-jaehwang and others added 2 commits April 3, 2026 18:00

feature: sub-block prefix caching (#442)

ab677e4

fix: restore PP-related code dropped in v0.13 bump (#507)

d2c77b1

Re-applies logic removed in 3495a41; pipeline parallel depends on it.

rebel-jiwoopark reviewed Apr 6, 2026

View reviewed changes

examples/experimental/run_lora_test.py Outdated Show resolved Hide resolved

junstar92 added 4 commits April 6, 2026 02:32

test: update LoRA tests for vLLM 0.18.0 and relocate some tests to th…

124b33a

…e unit test directory

fix: update VocabParallelEmbeddingWithLoRA forward for vLLM v0.18.0

c2127b5

test: add basic SQL LoRA e2e test and tune compile limits

42e9d5b

test: mark embedding LoRA path as expected failure due to numerical i…

312c9c2

…naccuracies

chore: remove unused environment variable for LoRA test script

0c2163f

junstar92 force-pushed the fix-lora-test branch from 350efab to 0c2163f Compare April 6, 2026 02:32

rebel-jiwoopark approved these changes Apr 6, 2026

View reviewed changes

rebel-jiwoopark self-requested a review April 8, 2026 07:30

rebel-jiwoopark force-pushed the dev-0.18 branch 2 times, most recently from 1c6f65a to 0c81ab6 Compare April 9, 2026 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lora): sync with vLLM 0.18.0 and update LoRA tests#504

fix(lora): sync with vLLM 0.18.0 and update LoRA tests#504
junstar92 wants to merge 24 commits intodev-0.18from
fix-lora-test

junstar92 commented Apr 2, 2026 •

edited

Loading

Uh oh!

rebel-jinhwan commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

rebel-jiwoopark left a comment

Uh oh!

rebel-jiwoopark commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

junstar92 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📸 Screenshots / Logs (if applicable)

📋 Checklist

💬 Notes

Uh oh!

rebel-jinhwan commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rebel-jiwoopark left a comment

Choose a reason for hiding this comment

Uh oh!

rebel-jiwoopark commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

junstar92 commented Apr 2, 2026 •

edited

Loading

rebel-jinhwan commented Apr 2, 2026 •

edited

Loading