other: log KV cache layout, warm-up phases, rbln backend invocations by rebel-jaehwang · Pull Request #589 · RBLN-SW/vllm-rbln

rebel-jaehwang · 2026-05-07T06:03:13Z

🚀 Summary of Changes

To improve observability, log

summary of KV cache shape and size
warm up phases
information about each invocation of torch.compile rbln backend (call stack, input shape, and whether the compilation is outside warm up phase)

📌 Related Issues / Tickets

https://github.com/rebellions-sw/vllm-rbln-internal/issues/63

✅ Type of Change

❓ Other (other): please describe

example

$ VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_RBLN_DECODE_BATCH_BUCKET_STRATEGY=manual VLLM_RBLN_DECODE_BATCH_BUCKET_MANUAL_BUCKETS=1,8 python examples/experimental/offline_inference_basic.py
...
[rbln_model_runner.py:4417] KV cache: num_blocks=370, num_groups=1, num_tensors=16, total=11.562 GiB
[rbln_model_runner.py:4434] KV cache: 16 layer(s) shape/dtype
...
[rbln_model_runner.py:1924] Warm-up: prefill (seq_len=128)
...
[torch_compile_backend.py:83] rbln_backend [warm-up] rbln_model_runner.py:2924(execute_model) <- rbln_model_runner.py:2105(_execute_dummy_requests) <- rbln_model_runner.py:1925(_warm_up_model_inner): inputs=[(1, 128):torch.int64, (1, 128):torch.int64, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (1, 40):torch.int16, (40,):torch.int32, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (1,):torch.int32]
...
[torch_compile_backend.py:91] rbln_backend done: xx s
[rbln_model_runner.py:1977] Warm-up: decode (batch_bucket=8, query_len=1)
[torch_compile_backend.py:83] rbln_backend [warm-up] rbln_model_runner.py:2924(execute_model) <- rbln_model_runner.py:2105(_execute_dummy_requests) <- rbln_model_runner.py:1990(_warm_up_model_inner): inputs=[(8, 1):torch.int64, (8, 1):torch.int64, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (8, 40):torch.int16, (8, 40):torch.int32, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16]
...
[torch_compile_backend.py:91] rbln_backend done: xx s
[rbln_model_runner.py:1977] Warm-up: decode (batch_bucket=8, query_len=1)
[torch_compile_backend.py:83] rbln_backend [warm-up] rbln_model_runner.py:2924(execute_model) <- rbln_model_runner.py:2105(_execute_dummy_requests) <- rbln_model_runner.py:1990(_warm_up_model_inner): inputs=[(8, 1):torch.int64, (8, 1):torch.int64, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (8, 40):torch.int16, (8, 40):torch.int32, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16, (2, 370, 8, 1, 1024, 64):torch.bfloat16]

To improve observability, log * summary of KV cache shape and size * warm up phases * information about each invocation of torch.compile rbln backend (call stack, input shape, and whether the compilation is outside warm up phase) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-07T07:02:38Z

Codecov Report

❌ Patch coverage is 65.15152% with 23 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
vllm_rbln/v1/worker/rbln_model_runner.py	14.28%	18 Missing ⚠️
vllm_rbln/torch_compile_backend.py	88.09%	2 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

rebel-jaehwang requested a review from rebel-jinhwan May 7, 2026 06:03

rebel-jinhwan approved these changes May 7, 2026

View reviewed changes

rebel-jaehwang added 3 commits May 7, 2026 19:00

test logged_rbln_backend

a7b4337

use logged_rbln_backend in more places

f55f525

fix test

689d72d

rebel-jaehwang merged commit 698bddd into dev May 8, 2026
17 checks passed

rebel-jaehwang deleted the logging branch May 8, 2026 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

other: log KV cache layout, warm-up phases, rbln backend invocations#589

other: log KV cache layout, warm-up phases, rbln backend invocations#589
rebel-jaehwang merged 4 commits intodevfrom
logging

rebel-jaehwang commented May 7, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rebel-jaehwang commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

example

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rebel-jaehwang commented May 7, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading