feat(core): add pooling model initial support for V1 engine #152

pei0033 · 2025-11-10T04:55:16Z

🚀 Summary of Changes

Implemented _pool() method in RBLNModelRunner for pooling model inference based on GPUModelRunner implementation
Adjusted warmup logic to handle pooling model initialization
Added comprehensive example scripts for Qwen3 embedding and reranker models

📌 Related Issues / Tickets

Resolves Support Pooling model #147

✅ Type of Change

✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (bug-fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

For Qwen3 Embedding

Run the embedding example: RBLN_PROFILER=0 RBLN_KERNEL_MODE=triton VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_USE_V1=1 python examples/experimental/qwen3_embedding.py
Verify that embedding vectors are generated for input texts
Expected output: Similarity scores between queries and documents.

For Qwen3 Reranker

Run the reranker example: RBLN_PROFILER=0 RBLN_KERNEL_MODE=triton VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_USE_V1=1 python examples/experimental/qwen3_reranker.py
Verify that relevance scores are computed for query-document pairs
Expected output: Score list showing document relevance.

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

The implementation follows the same pattern as upstream vLLM's V1 engine pooling model support
Warmup logic has been adjusted to properly initialize pooling models
This is an initial implementation; additional model types (e.g., BERT-based models) may be added in future PRs.
The pooling function currently runs on CPU without RBLN compilation. Future optimization may include compiling this operation for better performance.

pei0033 added 2 commits November 6, 2025 09:28

feat: initial implementation pooling model

fe53971

add qwen3 reranker and embedding model examples

ec5bd4c

pei0033 requested review from huijjj, rebel-jaehwang and rebel-jiwoopark November 10, 2025 04:55

pei0033 self-assigned this Nov 10, 2025

pei0033 added the torch.compile torch.compile based implementation label Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): add pooling model initial support for V1 engine #152

feat(core): add pooling model initial support for V1 engine #152

Uh oh!

pei0033 commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(core): add pooling model initial support for V1 engine #152

Are you sure you want to change the base?

feat(core): add pooling model initial support for V1 engine #152

Uh oh!

Conversation

pei0033 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📋 Checklist

💬 Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pei0033 commented Nov 10, 2025 •

edited

Loading