This directory contains Rust-based traditional ML algorithm implementations using Linfa for CPU-based inference.
Note: This package provides inference only. Training is done in Python. See
src/training/model_selection/ml_model_selection/.
| Algorithm | Backend | GPU Support | Status |
|---|---|---|---|
| KNN (K-Nearest Neighbors) | Linfa (linfa-nn) |
CPU only | ✅ Inference |
| KMeans (Clustering) | Linfa (linfa-clustering) |
CPU only | ✅ Inference |
| SVM (Support Vector Machine) | Linfa (linfa-svm) |
CPU only | ✅ Inference |
Reference: FusionFactory (arXiv:2507.10540).
┌─────────────────────────────────────────────────────────────────┐
│ TRAINING (Python) │
├─────────────────────────────────────────────────────────────────┤
│ src/training/model_selection/ml_model_selection/ │
│ ├── train.py # Train models (scikit-learn, PyTorch) │
│ ├── upload_model.py # Upload to HuggingFace │
│ └── download_model.py # Download from HuggingFace │
│ │
│ Output: knn/kmeans/svm_model.json │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ INFERENCE (Rust/Go) │
├─────────────────────────────────────────────────────────────────┤
│ ml-binding/ (Traditional ML - Linfa) │
│ ├── src/knn.rs # Load JSON, select using Linfa Ball Tree │
│ ├── src/kmeans.rs # Load JSON, select using cluster centroids │
│ ├── src/svm.rs # Load JSON, select using decision function │
│ └── ml_binding.go # Go bindings via CGO │
└─────────────────────────────────────────────────────────────────┘
ml-binding/
├── Cargo.toml # Rust dependencies (Linfa only)
├── go.mod # Go module
├── ml_binding.go # Go wrapper with CGO bindings
├── README.md # This file
└── src/
├── lib.rs # Library entry point
├── knn.rs # KNN inference implementation
├── kmeans.rs # KMeans inference implementation
├── svm.rs # SVM inference implementation
└── ffi.rs # C FFI exports for Go (inference only)
Note: Requires Linux/macOS/WSL with Rust and CGO. Windows native is not supported.
- Rust 1.70+ (
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh) - Go 1.22+
cd ml-binding
# Build release version
cargo build --release
# The library will be at:
# - Linux: target/release/libml_semantic_router.so
# - macOS: target/release/libml_semantic_router.dylib# Linux
export LD_LIBRARY_PATH=$(pwd)/target/release:$LD_LIBRARY_PATH
# macOS
export DYLD_LIBRARY_PATH=$(pwd)/target/release:$DYLD_LIBRARY_PATH# Rust tests
cargo test
# Go tests (after building Rust library)
go test -v ./...package main
import (
ml "github.com/vllm-project/semantic-router/ml-binding"
"os"
)
func main() {
// Load pretrained KNN model from JSON
jsonData, _ := os.ReadFile("models/knn_model.json")
knn, _ := ml.KNNFromJSON(string(jsonData))
defer knn.Close()
// Run inference
query := []float64{0.9, 0.1, 0.0, /* ... 1038 dims total (1024 embedding + 14 category) */}
selected, _ := knn.Select(query)
// selected == "llama-3.2-3b" (or whichever model the KNN selects)
// Same pattern for KMeans and SVM
kmeansData, _ := os.ReadFile("models/kmeans_model.json")
kmeans, _ := ml.KMeansFromJSON(string(kmeansData))
svmData, _ := os.ReadFile("models/svm_model.json")
svm, _ := ml.SVMFromJSON(string(svmData))
}| Function | Description |
|---|---|
KNNFromJSON(json) |
Load KNN model from JSON |
KMeansFromJSON(json) |
Load KMeans model from JSON |
SVMFromJSON(json) |
Load SVM model from JSON |
*.Select(embedding) |
Select best model for query |
*.IsTrained() |
Check if model is loaded |
*.ToJSON() |
Serialize model to JSON |
*.Close() |
Release resources |
Training is done in Python using scikit-learn. See src/training/model_selection/ml_model_selection/:
# Install dependencies
cd src/training/model_selection/ml_model_selection
pip install -r requirements.txt
# Train all models
python train.py \
--data-file benchmark.jsonl \
--output-dir models/
# Or download pretrained from HuggingFace
python download_model.py --output-dir models/- Performance: Native Rust speed for inference
- Consistency: Same FFI pattern as candle-binding
- Memory safety: Rust guarantees
- No Python dependency: Production inference without Python runtime
- Uses Linfa Ball Tree for O(log n) neighbor search
- Quality-weighted voting:
score = 0.9 * quality + 0.1 * speed - Loads embeddings and metadata from JSON
- Loads cluster centroids from JSON
- Assigns queries to nearest centroid
- Each cluster maps to best model (by quality+speed)
- Supports Linear and RBF kernels
- Loads support vectors from JSON
- One-vs-All classification for multi-model selection
Apache-2.0 (same as semantic-router)