Use of LLM-based inference is evolving from its origins of chat. These days, use cases involve the combination of multiple inference calls, tool calls, and database lookups. RAG, agentic AI, and deep research are three examples of these more sophisticated use cases.
The goal of this project to facilitate optimizations that drastically reduce the cost of inference for RAG, agentics, and deep research (by 10x 1) without harming accuracy. Our approach is to generalize the interface to inference servers via the Span Query.
In a span query, chat is a special case of a more general form. To the right is a visualization of a span query for a "judge/generator" (a.k.a. "LLM-as-a-judge").
Learn more about span query syntax and semantics
SPNL is a library for creating, optimizing, and tokenizing span queries. The library is surfaced for consumption as:
vLLM image | vLLM patch | CLI image | CLI image with Ollama | Rust crate | Python pip | Playground
The spnl CLI provides commands for running span queries and managing vLLM deployments. For macOS users, you can install via Homebrew:
# Add the tap
brew tap IBM/spnl https://github.com/IBM/spnl
# Install the spnl CLI
brew install spnlFor other platforms, you can download the latest spnl CLI from the SPNL releases page.
The spnl CLI provides commands to easily deploy and manage vLLM inference servers on Kubernetes or Google Compute Engine. See the vLLM documentation for detailed instructions.
Quick example:
# Bring up a vLLM server on Kubernetes (requires HuggingFace token)
spnl vllm up my-deployment --target k8s --hf-token YOUR_HF_TOKEN
# Bring down the vLLM server
spnl vllm down my-deployment --target k8sTo kick the tires with the spnl CLI running Ollama:
podman run --rm -it ghcr.io/ibm/spnl-ollama --verboseThis will run a judge/generator email example. You also can point it to a JSON file containing a span query.
For comprehensive CLI documentation including all commands, options, and examples, see docs/cli.md.
Quick reference:
# Run a query
spnl run [OPTIONS]
# Manage vLLM deployments
spnl vllm <up|down> [OPTIONS]
# Get help
spnl --help
spnl run --help
spnl vllm --helpFirst, configure your
environment for Rust. Now
you can build the CLI with cargo build -p spnl-cli, which will
produce ./target/debug/spnl. Adding --release will produce a build
with source code optimizations in ./target/release/spnl.