Go SDK for building agentic applications backed by a local or self-hosted vLLM OpenAI-compatible server.
- Package:
vllmsdk - Default backend:
http://127.0.0.1:8000/v1
go get github.com/ethpandaops/vllm-agent-sdk-goThe SDK resolves configuration from explicit options first, then environment variables, then defaults.
| Variable | Description | Default |
|---|---|---|
VLLM_BASE_URL |
vLLM server base URL | http://127.0.0.1:8000/v1 |
VLLM_API_KEY |
Bearer auth token (optional, only if your server enforces auth) | (none) |
VLLM_MODEL |
Model name | (none — must be set via env or WithModel()) |
VLLM_AGENT_SESSION_STORE_PATH |
Local session store directory | (none) |
Example-only variables (not resolved by the core SDK):
| Variable | Description | Default |
|---|---|---|
VLLM_IMAGE_MODEL |
Image-capable model for multimodal examples | QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ |
VLLM_VISION_MODEL |
Vision model for multimodal input examples | Falls back to VLLM_IMAGE_MODEL, then VLLM_MODEL |
VLLM_IMAGE_OUTPUT_DIR |
Directory for saving generated images | (none) |
All settings follow the same resolution order:
- Explicit option (e.g.
WithBaseURL(...),WithAPIKey(...),WithModel(...)) - Environment variable (
VLLM_BASE_URL,VLLM_API_KEY,VLLM_MODEL) - Built-in default (where applicable)
The repo ships a sibling-style Makefile:
make testruns race-enabled package tests with coverage output.make test-integrationruns./integration/...with-tags=integration.make auditruns the aggregate quality gate.
Integration setup:
- Set
VLLM_BASE_URLor default tohttp://127.0.0.1:8000/v1. - Set
VLLM_MODELto the model served by your vLLM instance. - Set
VLLM_API_KEYif your vLLM server enforces bearer auth. - Integration tests skip when the local vLLM server is unavailable.
package main
import (
"context"
"fmt"
"time"
vllmsdk "github.com/ethpandaops/vllm-agent-sdk-go"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
// Model resolved from VLLM_MODEL env var, or set explicitly:
for msg, err := range vllmsdk.Query(
ctx,
vllmsdk.Text("Write a two-line haiku about Go concurrency."),
// vllmsdk.WithModel("QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ"),
) {
if err != nil {
panic(err)
}
if result, ok := msg.(*vllmsdk.ResultMessage); ok && result.Result != nil {
fmt.Println(*result.Result)
}
}
}Query(ctx, content, ...opts)andQueryStream(...)returniter.Seq2[Message, error].NewClient()exposesStart,StartWithContent,StartWithStream,Query,ReceiveMessages,ReceiveResponse,Interrupt,SetPermissionMode,SetModel,ListModels,ListModelsResponse,GetMCPStatus,RewindFiles, andClose.- Unsupported peer-parity controls such as
ReconnectMCPServer,ToggleMCPServer,StopTask, andSendToolResultare present onClientand return typedUnsupportedControlErrors. UserMessageContentis the canonical input shape. UseText(...)for text-only calls andBlocks(...)withImageInput(...),FileInput(...),AudioInput(...), orVideoInput(...)for multimodal chat-completions requests.WithSDKTools(...)registers high-level in-process tools undermcp__sdk__<name>.WithOnUserInput(...)handles SDK-owned user-input prompts built on top of tool calling.ListModels(...)andListModelsResponse(...)usevLLMmodel discovery via/v1/models.StatSession(...),ListSessions(...), andGetSessionMessages(...)operate on the SDK's local persisted session store.
- Discovery uses
/v1/models. - Returned
ModelInfovalues are projected from the OpenAI-compatible model cards that vLLM serves, so provider-rich VLLM metadata is no longer guaranteed. ModelInfostill exposes helper methods such asCostTier(),SupportsToolCalling(),SupportsStructuredOutput(),SupportsReasoning(),SupportsImageInput(),SupportsImageOutput(),SupportsWebSearch(),SupportsPromptCaching(),MaxContextLength(), and parsed pricing helpers.
- Generated images are surfaced as
*ImageBlockvalues insideAssistantMessage.Content. ImageBlock.Decode()returns raw bytes plus media type for data-URL-backed images.ImageBlock.Save(path)writes generated images to disk.- Live image-generation coverage is available behind the integration build tag when
VLLM_IMAGE_MODELis set.
Multimodal input in this SDK is block-based and targets the vLLM OpenAI-compatible chat surface.
content := vllmsdk.Blocks(
vllmsdk.TextInput("Compare these two screenshots and the attached spec file."),
vllmsdk.ImageInput("https://example.com/before.png"),
vllmsdk.ImageInput("data:image/png;base64,..."),
vllmsdk.FileInput("spec.pdf", "data:application/pdf;base64,..."),
)
for msg, err := range vllmsdk.Query(ctx, content,
// vllmsdk.WithModel("QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ"),
) {
_ = msg
_ = err
}ImageInput(...)accepts a normal URL or a base64 data URL.FileInput(...)accepts a filename plusfile_dataURL/data URL.AudioInput(...)accepts base64 audio data plus a format.VideoInput(...)accepts a normal URL or a data URL.- Responses mode is routed to the vLLM
/v1/responsessurface when selected.
Session APIs are local SDK APIs, not remote vLLM server sessions.
- They read from the SDK session store configured with
WithSessionStorePath(...)orVLLM_AGENT_SESSION_STORE_PATH. - They do not derive from chat
session_id. - They do not derive from Responses
previous_response_id.
vLLM does not have meaningful backend equivalents for some sibling control-plane methods. The SDK exposes those methods where peer parity matters, but they fail explicitly with UnsupportedControlError instead of faking semantics.
Runnable examples live under examples.