Skip to content

Commit 94e4aaf

Browse files
authored
Merge branch 'EricLBuehler:master' into master
2 parents b3db13c + 264d71b commit 94e4aaf

File tree

403 files changed

+44280
-6263
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

403 files changed

+44280
-6263
lines changed

.github/workflows/build_cuda_all.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
build-and-push-image:
1515
strategy:
1616
matrix:
17-
compute_capability: [75, 80, 86, 89, 90]
17+
compute_capability: [80, 86, 89, 90]
1818
fail-fast: false
1919
runs-on: ubuntu-latest
2020

.github/workflows/docs.yml

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: docs
2-
#https://dev.to/deciduously/prepare-your-rust-api-docs-for-github-pages-2n5i
2+
33
on:
44
push:
55
branches: ["master"]
@@ -18,28 +18,30 @@ concurrency:
1818
jobs:
1919
deploy:
2020
runs-on: ubuntu-latest
21-
strategy:
22-
matrix:
23-
rust: [stable]
2421
steps:
2522
- name: Checkout
2623
uses: actions/checkout@v4
27-
- uses: actions-rs/toolchain@v1
24+
25+
- name: Setup Rust
26+
uses: actions-rs/toolchain@v1
2827
with:
2928
profile: minimal
30-
toolchain: ${{ matrix.rust }}
29+
toolchain: stable
3130
override: true
31+
32+
- name: Setup mdbook
33+
uses: peaceiris/actions-mdbook@v2
34+
with:
35+
mdbook-version: 'latest'
36+
3237
- name: Setup Pages
3338
uses: actions/configure-pages@v5
34-
- uses: actions-rs/cargo@v1
35-
with:
36-
command: doc
37-
args: --no-deps
38-
- name: Build docs
39-
run: |
40-
rm -rf ./docs
41-
echo "<meta http-equiv=\"refresh\" content=\"0; url=mistralrs\">" > target/doc/index.html
42-
cp -r target/doc ./docs
39+
40+
# Build mdbook (main documentation)
41+
- name: Build mdbook
42+
run: mdbook build docs
43+
44+
# Build Python docs
4345
- name: Build Python docs
4446
run: |
4547
python3 -m venv myenv
@@ -48,8 +50,9 @@ jobs:
4850
cd mistralrs-pyo3
4951
maturin develop
5052
cd ..
51-
pdoc mistralrs -o ./docs/pyo3
53+
pdoc mistralrs -o ./docs/book/pyo3
54+
5255
- name: Deploy
5356
uses: JamesIves/github-pages-deploy-action@v4
5457
with:
55-
folder: ./docs
58+
folder: ./docs/book

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,7 @@
55
.DS_Store
66
.idea
77
mistral.rs/
8-
mistralrs-web-chat/cache
8+
mistralrs-web-chat/cache
9+
10+
# mdbook output
11+
docs/book/

.typos.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ extend-exclude = [
33
".git/",
44
"calibration_data/",
55
"examples/server/phi3_duckduckgo_mistral.rs.ipynb",
6-
"mistralrs-web-chat/static/"
6+
"mistralrs-web-chat/static/",
7+
"mistralrs-cli/static/"
78
]
89
ignore-hidden = false
910

AGENTS.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ This file provides instructions for AI agents to understand the layout of the `m
1111
- `/mistralrs-quant/` : Quantization support (ISQ, GGUF, GPTQ, AWQ, FP8, HQQ, etc.)
1212
- `/mistralrs-paged-attn/`: PagedAttention implementation
1313
- `/mistralrs-pyo3/` : Python bindings (PyO3)
14-
- `/mistralrs-server/` : CLI & OpenAI-compatible HTTP server (subcommands: run/vision-plain, diffusion, speech)
14+
- `/mistralrs-cli/` : Unified CLI binary (commands: run, serve, bench, from-config)
1515
- `/mistralrs-server-core/`: Shared server core logic
16-
- `/mistralrs-web-chat/` : Web chat application (static assets & backend integration)
17-
- `/mistralrs-bench/` : Benchmarking tools
16+
- `/mistralrs-web-chat/` : (Deprecated) Use `mistralrs serve --ui` instead
17+
- `/mistralrs-bench/` : (Deprecated) Use `mistralrs bench` instead
1818
- `/docs/` : Markdown documentation for models, features, and guides
1919
- `/examples/` : Usage examples (Rust, Python, server samples, notebooks)
2020
- `/chat_templates/` : Chat formatting templates (JSON/Jinja)
@@ -26,17 +26,17 @@ Mistral.rs supports multiple model types and advanced features via dedicated cra
2626

2727
- **Text Inference**
2828
- Crate: `mistralrs-core` (low-level ops), `mistralrs` (API wrapper)
29-
- CLI: `run` / `plain` subcommand in `mistralrs-server`
29+
- CLI: `mistralrs run -m <model>` or `mistralrs serve -m <model>` (auto-detects model type)
3030
- Docs: `docs/SAMPLING.md`, `docs/TOOL_CALLING.md`
3131
- **Vision Models**
3232
- Crate: `mistralrs-vision`
33-
- CLI: `vision-plain` subcommand
33+
- CLI: `mistralrs run -m <model>` (auto-detects vision models)
3434
- Docs: `docs/VISION_MODELS.md`, `docs/IMAGEGEN_MODELS.md`, `docs/IMATRIX.md`
3535
- **Diffusion Models**
36-
- CLI: `diffusion` subcommand
36+
- CLI: `mistralrs run -m <model>` (auto-detects diffusion models)
3737
- Docs: `docs/FLUX.md`
3838
- **Speech Models**
39-
- CLI: `speech` subcommand
39+
- CLI: `mistralrs run -m <model>` (auto-detects speech models)
4040
- Docs: `docs/DIA.md`
4141
- **Quantization & ISQ**
4242
- Crate: `mistralrs-quant`
@@ -58,10 +58,10 @@ Mistral.rs supports multiple model types and advanced features via dedicated cra
5858
```bash
5959
cargo build --workspace --release --features "<features>"
6060
```
61-
4. Or build/install only the server binary:
61+
4. Or build/install only the CLI binary:
6262
```bash
63-
cargo build --release --package mistralrs-server --features "<features>"
64-
cargo install --path mistralrs-server --features "<features>"
63+
cargo build --release --package mistralrs-cli --features "<features>"
64+
cargo install --path mistralrs-cli --features "<features>"
6565
```
6666

6767
## Models
@@ -116,9 +116,11 @@ Avoid returning TODOs.
116116
```bash
117117
python3 examples/python/<script>.py
118118
```
119-
- Run server/CLI:
119+
- Run CLI:
120120
```bash
121-
./target/release/mistralrs-server -i <mode> -m <model> [options]
121+
mistralrs run -m <model> # Interactive mode
122+
mistralrs serve -p 1234 -m <model> # Server mode
123+
mistralrs bench -m <model> # Benchmarking
122124
```
123125

124126
## CI Parity

CLAUDE.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
mistral.rs is a blazing-fast LLM inference engine written in Rust. It supports text, vision, image generation, and speech models with multiple APIs (Rust, Python, OpenAI HTTP, MCP).
7+
mistral.rs is a blazing-fast LLM inference engine written in Rust. It supports text, vision, image generation, and speech models with Rust and Python SDKs, plus OpenAI HTTP and MCP APIs.
88

99
## Essential Commands
1010

@@ -19,8 +19,8 @@ cargo build --release --features "cuda flash-attn cudnn"
1919
# With Metal support (macOS)
2020
cargo build --release --features metal
2121

22-
# Install server binary
23-
cargo install --path mistralrs-server --features <features>
22+
# Install CLI binary
23+
cargo install --path mistralrs-cli --features <features>
2424
```
2525

2626
### Testing & Quality
@@ -40,14 +40,20 @@ cargo clippy --workspace --tests --examples -- -D warnings
4040

4141
### Running Models
4242
```bash
43-
# Run interactive mode with plain model
44-
cargo run --release --features <features> -- -i plain -m <model_id> -a <arch>
43+
# Run interactive mode (model type auto-detected)
44+
mistralrs run -m <model_id>
4545

4646
# Run with GGUF quantized model
47-
cargo run --release --features <features> -- -i gguf -f <file> -t <tokenizer>
47+
mistralrs run --format gguf -m <repo> -f <file>
4848

4949
# Run server
50-
cargo run --release --features <features> -- --port 1234 <model_args>
50+
mistralrs serve -p 1234 -m <model_id>
51+
52+
# Run server with web UI
53+
mistralrs serve --ui -m <model_id>
54+
55+
# Run benchmarks
56+
mistralrs bench -m <model_id>
5157
```
5258

5359
## Models
@@ -60,16 +66,16 @@ You should also look for a model.safetensors.index.json file for the model at ha
6066

6167
### Workspace Structure
6268
- `mistralrs-core/` - Core inference engine, model implementations, pipelines
63-
- `mistralrs-server/` - CLI binary entry point
69+
- `mistralrs-cli/` - Unified CLI binary (commands: run, serve, bench, from-config)
6470
- `mistralrs-server-core/` - HTTP server routing, OpenAI API implementation
65-
- `mistralrs-pyo3/` - Python bindings (PyO3)
66-
- `mistralrs/` - High-level Rust API
71+
- `mistralrs-pyo3/` - Python SDK (PyO3 bindings)
72+
- `mistralrs/` - Rust SDK (high-level crate)
6773
- `mistralrs-vision/` - Vision model support
6874
- `mistralrs-quant/` - Quantization implementations (ISQ, GGUF, GPTQ, etc.)
6975
- `mistralrs-paged-attn/` - PagedAttention implementation
7076
- `mistralrs-audio/` - Audio processing
7177
- `mistralrs-mcp/` - Model Context Protocol client
72-
- `mistralrs-bench/` - Benchmarking tools
78+
- `mistralrs-bench/` - (Deprecated) Use `mistralrs bench` instead
7379

7480
### Key Design Patterns
7581

@@ -88,7 +94,7 @@ When adding new model architectures:
8894
2. Add pipeline support in `mistralrs-core/src/pipeline/`
8995
3. Update model detection in `mistralrs-core/src/pipeline/normal.rs`
9096
4. Add architecture enum variant in `mistralrs-core/src/lib.rs`
91-
5. Update CLI args in `mistralrs-server/src/main.rs`
97+
5. Update CLI args in `mistralrs-cli/src/main.rs`
9298

9399
When adding new quantization methods:
94100
1. Implement in `mistralrs-quant/src/`
@@ -100,8 +106,8 @@ When adding new quantization methods:
100106
- `mistralrs-core/src/engine/mod.rs` - Main engine orchestration
101107
- `mistralrs-core/src/pipeline/mod.rs` - Pipeline trait and common logic
102108
- `mistralrs-server-core/src/routes.rs` - HTTP API endpoints
103-
- `mistralrs-pyo3/src/lib.rs` - Python API entry point
104-
- `mistralrs/examples/` - Usage examples for Rust API
109+
- `mistralrs-pyo3/src/lib.rs` - Python SDK entry point
110+
- `mistralrs/examples/` - Usage examples for Rust SDK
105111

106112
### Testing Approach
107113

0 commit comments

Comments
 (0)