remove duplicate should_stop_processing check
feat: Refactor the fetching request logic (
NVIDIA#5786 )
QiJunepushed 1 commit to main • 7381f1d…ee45e0c • 2 hours ago
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (
N…
QiJunepushed 13 commits to main • 88076ee…7381f1d • 3 hours ago
polish deprecation policy
Merge branch 'main' into deprecation
[fix] Fix can_use_alltoall in fused_moe_wide_ep.py (
NVIDIA#6173 )
QiJunepushed 4 commits to main • a433eba…88076ee • 22 hours ago
Merge branch 'feat/1.0_doc_dev' into model-feature
[TRTLLM-6091][docs] Update docs/trtllm sampler 1.0 (
NVIDIA#5833 )
enh: Lift expectation of single image per sample in Gemma3 VLM (
NVIDI…
QiJunepushed 28 commits to main • c0e4165…a433eba • yesterday
add more log in FmhaDispatcher
QiJunepushed 6 commits to main • ae28b3a…c0e4165 • 3 days ago
feat: Add support for benchmarking individual gemms in MOE benchmark (
N…
QiJunepushed 16 commits to main • e821c68…ae28b3a • 4 days ago
CI: update multi gpu test trigger file list (
NVIDIA#6131 )
QiJunepushed 2 commits to main • d4d21a1…e821c68 • 4 days ago
update multi gpu trigger file list
QiJunepushed 6 commits to main • 2d2b8ba…d4d21a1 • 4 days ago
implement a safe chunked broadcast
add more error message for broadcasting new requests
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (
NVIDI…
QiJunepushed 83 commits to main • ce39409…2d2b8ba • 5 days ago
Merge branch 'release/0.21' into release-notes
You can’t perform that action at this time.