-
Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: mistralai/mistral-inference
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: pass explicit device to attention mask creation in cache
#267
opened Mar 2, 2026 by
abdelhadi703
Loading…
fix: move length_tensor to CUDA before NCCL broadcast in distributed inference
#266
opened Mar 2, 2026 by
abdelhadi703
Loading…
fix(readme): correct broken vLLM link in deploy section
#264
opened Mar 2, 2026 by
abdelhadi703
Loading…
Fix a broken Dockerfile and reduce the final image size using a multi-stage build
#263
opened Feb 21, 2026 by
framsouza
Loading…
Fix NCCL broadcast error on CPU tensors in distributed inference
#257
opened Oct 1, 2025 by
Pratham-Nayak1
Loading…
feat(model-service): add OpenAI-compatible wrapper (+ pm2 + env example) and update ignores
#254
opened Aug 27, 2025 by
MCVelasquez45
Loading…
Optimize main.py for inference efficiency and GPU throughput (torch.compile, memory tuning, warp alignment)
#253
opened Aug 3, 2025 by
abdullatifcodes
Loading…
Fix: Proper JSON chunk handling in streaming response (OpenRouter API)
#248
opened Jul 4, 2025 by
ktdjiren
Loading…
[fix] Correctly pass mask in TransformerBlock.forward in transformer_layers.py
#218
opened Sep 18, 2024 by
MarcSzafraniec
Loading…
Fix device error when using cuda device other than cuda:0
#216
opened Aug 28, 2024 by
cornzz
Loading…
fix(README.md): correct verb agreement in model support statement
#166
opened May 30, 2024 by
CharlesCNorton
Loading…
Add CPU support to one_file_ref.py (the one file implementation)
#129
opened Feb 22, 2024 by
kikirizki
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.