Name	Name	Last commit message	Last commit date
parent directory ..
src	src
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
README.md	README.md
go.mod	go.mod
ml_binding.go	ml_binding.go

ML Binding for Semantic Router

This directory contains Rust-based traditional ML algorithm implementations using Linfa for CPU-based inference.

Note: This package provides inference only. Training is done in Python. See src/training/model_selection/ml_model_selection/.

Algorithms

Algorithm	Backend	GPU Support	Status
KNN (K-Nearest Neighbors)	Linfa (`linfa-nn`)	CPU only	✅ Inference
KMeans (Clustering)	Linfa (`linfa-clustering`)	CPU only	✅ Inference
SVM (Support Vector Machine)	Linfa (`linfa-svm`)	CPU only	✅ Inference

Reference: FusionFactory (arXiv:2507.10540).

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    TRAINING (Python)                             │
├─────────────────────────────────────────────────────────────────┤
│  src/training/model_selection/ml_model_selection/                               │
│  ├── train.py          # Train models (scikit-learn, PyTorch)   │
│  ├── upload_model.py   # Upload to HuggingFace                  │
│  └── download_model.py # Download from HuggingFace              │
│                                                                  │
│  Output: knn/kmeans/svm_model.json                               │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                    INFERENCE (Rust/Go)                           │
├─────────────────────────────────────────────────────────────────┤
│  ml-binding/                      (Traditional ML - Linfa)       │
│  ├── src/knn.rs    # Load JSON, select using Linfa Ball Tree    │
│  ├── src/kmeans.rs # Load JSON, select using cluster centroids  │
│  ├── src/svm.rs    # Load JSON, select using decision function  │
│  └── ml_binding.go # Go bindings via CGO                        │
└─────────────────────────────────────────────────────────────────┘

Directory Structure

ml-binding/
├── Cargo.toml           # Rust dependencies (Linfa only)
├── go.mod               # Go module
├── ml_binding.go        # Go wrapper with CGO bindings
├── README.md            # This file
└── src/
    ├── lib.rs           # Library entry point
    ├── knn.rs           # KNN inference implementation
    ├── kmeans.rs        # KMeans inference implementation
    ├── svm.rs           # SVM inference implementation
    └── ffi.rs           # C FFI exports for Go (inference only)

Note: Requires Linux/macOS/WSL with Rust and CGO. Windows native is not supported.

Building

Prerequisites

Rust 1.70+ (curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)
Go 1.22+

Build the Rust Library

cd ml-binding

# Build release version
cargo build --release

# The library will be at:
# - Linux: target/release/libml_semantic_router.so
# - macOS: target/release/libml_semantic_router.dylib

Set Library Path

# Linux
export LD_LIBRARY_PATH=$(pwd)/target/release:$LD_LIBRARY_PATH

# macOS
export DYLD_LIBRARY_PATH=$(pwd)/target/release:$DYLD_LIBRARY_PATH

Run Tests

# Rust tests
cargo test

# Go tests (after building Rust library)
go test -v ./...

Usage in Go

Loading Pretrained Models

package main

import (
    ml "github.com/vllm-project/semantic-router/ml-binding"
    "os"
)

func main() {
    // Load pretrained KNN model from JSON
    jsonData, _ := os.ReadFile("models/knn_model.json")
    knn, _ := ml.KNNFromJSON(string(jsonData))
    defer knn.Close()

    // Run inference
    query := []float64{0.9, 0.1, 0.0, /* ... 1038 dims total (1024 embedding + 14 category) */}
    selected, _ := knn.Select(query)
    // selected == "llama-3.2-3b" (or whichever model the KNN selects)

    // Same pattern for KMeans and SVM
    kmeansData, _ := os.ReadFile("models/kmeans_model.json")
    kmeans, _ := ml.KMeansFromJSON(string(kmeansData))
    
    svmData, _ := os.ReadFile("models/svm_model.json")
    svm, _ := ml.SVMFromJSON(string(svmData))
}

Available Functions

Function	Description
`KNNFromJSON(json)`	Load KNN model from JSON
`KMeansFromJSON(json)`	Load KMeans model from JSON
`SVMFromJSON(json)`	Load SVM model from JSON
`*.Select(embedding)`	Select best model for query
`*.IsTrained()`	Check if model is loaded
`*.ToJSON()`	Serialize model to JSON
`*.Close()`	Release resources

Training Models

Training is done in Python using scikit-learn. See src/training/model_selection/ml_model_selection/:

# Install dependencies
cd src/training/model_selection/ml_model_selection
pip install -r requirements.txt

# Train all models
python train.py \
  --data-file benchmark.jsonl \
  --output-dir models/

# Or download pretrained from HuggingFace
python download_model.py --output-dir models/

Why Linfa for Inference?

Performance: Native Rust speed for inference
Consistency: Same FFI pattern as candle-binding
Memory safety: Rust guarantees
No Python dependency: Production inference without Python runtime

Algorithm Details

KNN (K-Nearest Neighbors)

Uses Linfa Ball Tree for O(log n) neighbor search
Quality-weighted voting: score = 0.9 * quality + 0.1 * speed
Loads embeddings and metadata from JSON

KMeans

Loads cluster centroids from JSON
Assigns queries to nearest centroid
Each cluster maps to best model (by quality+speed)

SVM (Support Vector Machine)

Supports Linear and RBF kernels
Loads support vectors from JSON
One-vs-All classification for multi-model selection

License

Apache-2.0 (same as semantic-router)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ML Binding for Semantic Router

Algorithms

Architecture

Directory Structure

Building

Prerequisites

Build the Rust Library

Set Library Path

Run Tests

Usage in Go

Loading Pretrained Models

Available Functions

Training Models

Why Linfa for Inference?

Algorithm Details

KNN (K-Nearest Neighbors)

KMeans

SVM (Support Vector Machine)

License

FilesExpand file tree

ml-binding

Directory actions

More options

Directory actions

More options

Latest commit

History

ml-binding

Folders and files

parent directory

README.md

ML Binding for Semantic Router

Algorithms

Architecture

Directory Structure

Building

Prerequisites

Build the Rust Library

Set Library Path

Run Tests

Usage in Go

Loading Pretrained Models

Available Functions

Training Models

Why Linfa for Inference?

Algorithm Details

KNN (K-Nearest Neighbors)

KMeans

SVM (Support Vector Machine)

License