Skip to content
/ spnl Public

Span Queries: What if we had a way to plan and optimize GenAI like we do for SQL?

License

Notifications You must be signed in to change notification settings

IBM/spnl

Repository files navigation

Span Queries

arXiv Crates.io - Version PyPI - Version CI - Core CI - Python CI - Playground GitHub License

Use of LLM-based inference is evolving from its origins of chat. These days, use cases involve the combination of multiple inference calls, tool calls, and database lookups. RAG, agentic AI, and deep research are three examples of these more sophisticated use cases.

The goal of this project to facilitate optimizations that drastically reduce the cost of inference for RAG, agentics, and deep research (by 10x 1) without harming accuracy. Our approach is to generalize the interface to inference servers via the Span Query.

In a span query, chat is a special case of a more general form. To the right is a visualization of a span query for a "judge/generator" (a.k.a. "LLM-as-a-judge").

Learn more about span query syntax and semantics

Getting Started with SPNL

SPNL is a library for creating, optimizing, and tokenizing span queries. The library is surfaced for consumption as:

vLLM image | vLLM patch | CLI image | CLI image with Ollama | Rust crate | Python pip | Playground

Using the spnl CLI

The spnl CLI provides commands for running span queries and managing vLLM deployments. For macOS users, you can install via Homebrew:

# Add the tap
brew tap IBM/spnl https://github.com/IBM/spnl

# Install the spnl CLI
brew install spnl

For other platforms, you can download the latest spnl CLI from the SPNL releases page.

Managing vLLM Deployments

The spnl CLI provides commands to easily deploy and manage vLLM inference servers on Kubernetes or Google Compute Engine. See the vLLM documentation for detailed instructions.

Quick example:

# Bring up a vLLM server on Kubernetes (requires HuggingFace token)
spnl vllm up my-deployment --target k8s --hf-token YOUR_HF_TOKEN

# Bring down the vLLM server
spnl vllm down my-deployment --target k8s

Quick Start with Docker

To kick the tires with the spnl CLI running Ollama:

podman run --rm -it ghcr.io/ibm/spnl-ollama --verbose

This will run a judge/generator email example. You also can point it to a JSON file containing a span query.

CLI Usage

For comprehensive CLI documentation including all commands, options, and examples, see docs/cli.md.

Quick reference:

# Run a query
spnl run [OPTIONS]

# Manage vLLM deployments
spnl vllm <up|down> [OPTIONS]

# Get help
spnl --help
spnl run --help
spnl vllm --help

Building SPNL

First, configure your environment for Rust. Now you can build the CLI with cargo build -p spnl-cli, which will produce ./target/debug/spnl. Adding --release will produce a build with source code optimizations in ./target/release/spnl.

Footnotes

  1. https://arxiv.org/html/2409.15355v5