Skip to content

Conversation

@pei0033
Copy link
Collaborator

@pei0033 pei0033 commented Nov 10, 2025

🚀 Summary of Changes

  • Implemented _pool() method in RBLNModelRunner for pooling model inference based on GPUModelRunner implementation
  • Adjusted warmup logic to handle pooling model initialization
  • Added comprehensive example scripts for Qwen3 embedding and reranker models

📌 Related Issues / Tickets


✅ Type of Change

  • ✨ Feature (feature)
  • 🧠 Model support (model)
  • 🧬 Core engine changes (core)
  • 🛠 Bug fix (bug-fix)
  • ⚙️ Performance improvement (perf)
  • 🔁 Refactor or code cleanup (refactor)
  • 📄 Documentation (docs)
  • ❓ Other (other): please describe

🧪 How to Test

For Qwen3 Embedding

  1. Run the embedding example: RBLN_PROFILER=0 RBLN_KERNEL_MODE=triton VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_USE_V1=1 python examples/experimental/qwen3_embedding.py
  2. Verify that embedding vectors are generated for input texts
  3. Expected output: Similarity scores between queries and documents.

For Qwen3 Reranker

  1. Run the reranker example: RBLN_PROFILER=0 RBLN_KERNEL_MODE=triton VLLM_RBLN_USE_VLLM_MODEL=1 VLLM_USE_V1=1 python examples/experimental/qwen3_reranker.py
  2. Verify that relevance scores are computed for query-document pairs
  3. Expected output: Score list showing document relevance.

📋 Checklist

  • PR title follows Conventional Commits format
  • This PR is linked to an existing issue
  • The test method is described, and the expected result is clearly stated
  • Relevant documentation has been updated (if applicable)

💬 Notes

  • The implementation follows the same pattern as upstream vLLM's V1 engine pooling model support
  • Warmup logic has been adjusted to properly initialize pooling models
  • This is an initial implementation; additional model types (e.g., BERT-based models) may be added in future PRs.
  • The pooling function currently runs on CPU without RBLN compilation. Future optimization may include compiling this operation for better performance.

@pei0033 pei0033 self-assigned this Nov 10, 2025
@pei0033 pei0033 added the torch.compile torch.compile based implementation label Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

torch.compile torch.compile based implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants