Skip to content

Latest commit

 

History

History
12 lines (9 loc) · 527 Bytes

File metadata and controls

12 lines (9 loc) · 527 Bytes

Distributed inference in mistral.rs

Mistral.rs supports distributed inference with a few strategies

What backend is best?

  • For CUDA-only system: NCCL
  • Anything else: Ring backend

The Ring backend is also heterogenous! This means that you can use the Ring backend on any set of multiple devices connected over TCP. For example, you can connect 2 Metal systems, or 2 Metal and 1 CPU system with the Ring backend!