A high-performance, memory-safe API for serving machine learning model inferences, built with Rust, Axum, and ONNX Runtime. Designed for fast cloud deployment and scalable AI systems.
- 🦀 Built with async Rust (Tokio + Axum)
- 🧠 ONNX model inference serving (MNIST classifier)
- 🐳 Docker-ready for easy deployment
- ☁️ Cloud-compatible (Containers)
- 🛡️ Defensive input validation and error handling
- Rust
- Axum (Web framework)
- Tokio (Async runtime)
- ONNX Runtime
- Docker
- Clone the repo:
git clone https://github.com/melizalde-ds/rust-ml-inference-api.git
cd rust-ml-inference-api
Server will start at: http://localhost:3000
POST /predict
Request: JSON body with flattened 28x28 grayscale image (784 float32 values)
Response: Predicted probabilities for digits 0–9
Example request:
{
"values": [
0.0,
0.1,
0.2,
0.0,
...,
(784
floats
total)
]
}
Example curl:
curl -X POST http://localhost:3000/predict -H "Content-Type: application/json" -d '{"values": [0.0, 0.1, 0.2, 0.0, ..., 0.0]}'
GET /healthz
Response: OK
Build the image:
docker build -t rust-ml-inference-api .
Run the container:
docker run -p 3000:3000 rust-ml-inference-api
Access your API at: http://localhost:3000
The ONNX model (mnist-8.onnx) is stored under /onnx_models/ for easy access.
Original source: ONNX Model Zoo - MNIST 8
- Structured logging with tracing
- Support dynamic model outputs
- Optimize concurrency for batch inference
- Model hot-reloading
- Improve ONNX runtime threading configuration
MIT License
Always happy to collaborate on Rust, Cloud, or AI projects! Feel free to reach out on LinkedIn!