All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Switch reqwest TLS crypto provider from
aws-lc-rstoringby usingrustls-no-providerfeature and adding an explicitrustlsdependency withringbackend. This eliminates__isoc23_strtoland related glibc 2.38+ symbols emitted byaws-lc-sys0.40.0, restoring the GLIBC_2.28 ABI floor required by downstream users (e.g. Node.js aarch64 bindings).
- CLI binary tarballs (Linux x86_64/aarch64, macOS aarch64, Windows x86_64) attached to GitHub Releases for direct download — closes #64
scripts/generate_pricing.pyregeneratesschemas/pricing.jsonfrom models.dev, wired intotask generate:pricing,task update, andtask upgradeUsage::prompt_tokens_details({ cached_tokens, audio_tokens }) deserialised from the OpenAI-compatible response body, pluscost::completion_cost_with_cacheand matchingcache_read_input_token_cost/cache_creation_input_token_costfields onModelPricing.ChatCompletionResponse::estimated_costand theCostTrackingLayernow bill cached prompt tokens at the provider's discounted cache-read rate when the model has cache pricing inschemas/pricing.json— closes #65schemas/pricing.jsoncarriescache_read_input_token_cost/cache_creation_input_token_costfor the 1,500+ models on models.dev that publish cache pricing
schemas/pricing.jsonnow covers 4,219 models (up from 35) sourced from models.dev — closes #48- GitHub Release CLI assets ship a single sorted
SHA256SUMS-<version>.txt(sha256sum-verifiable) instead of one.sha256per archive — closes #67 - WebAssembly build verified
mio-free. Theliter-llmcrate exposes two mutually exclusive HTTP-stack features —native-http(reqwest + tokio + memchr + base64) andwasm-http(reqwest + memchr + base64 + gloo-timers, no tokio dependency). Theliter-llm-wasmcrate enables onlywasm-http; the workspace'sreqwestis pinned withdefault-features = false, features = ["json", "stream", "rustls", "multipart", "form"]. As a result,cargo build --target wasm32-unknown-unknown -p liter-llm-wasmpulls neithermionortokiointo the dependency tree — reqwest auto-routes to the browser/NodefetchAPI onwasm32targets.
- Alef migration: All language bindings are now auto-generated by alef instead of hand-written
BoxFuture/BoxStreamtype aliases no longer wrapResult<T>— all method signatures now explicitly returnResult<T>providermodule is now public (waspub(crate))ChatCompletionRequest.streamfield is now public (waspub(crate))- Switched spell checker from codespell to typos
- CI no longer runs code generation — only
alef verify --exit-codefor freshness checks - Updated alef to v0.5.9
alef.tomlconfiguration for 10 language targets, 23 API method call configs, mock server supportbindings.rsadapter module withcreate_clientandcreate_client_from_jsonbinding-friendly constructorsDefaultderives on all public types for binding compatibilityClonederive onDefaultClient- E2E test fixtures converted to alef format (167+ fixtures across 23 categories)
- E2E tests regenerated for 13 languages with mock HTTP server support
- Test apps generated with
alef e2e generate --registry - API reference documentation auto-generated with
alef docsfor all 10 languages - Package READMEs generated with
alef readmeusing restored Jinja templates alef-verifyandalef-sync-versionspre-commit hooksalef verify --exit-codestep in CI validation workflow.lychee.tomllink checker configuration_typos.tomlspell checker configuration- Auto-load API keys from environment variables
- FFI callback streaming support
chat_streammethod across all bindings
liter-llm-bindings-corecrate — replaced by alef codegentools/e2e-generatorcrate — replaced byalef e2e generatescripts/sync_versions.py— replaced byalef sync-versionsscripts/generate_readme.py— replaced byalef readmescripts/readme_config.yamlandscripts/readme_templates/— replaced bytemplates/readme/tests/test_apps/— replaced bytest_apps/(alef registry mode)- Hand-written binding source in
crates/liter-llm-{py,node,ffi,wasm,php}/src/ - Hand-written package source in
packages/{go,java,csharp,ruby,elixir}/
1.2.2 - 2026-04-18
- GitHub Copilot OAuth Device Flow credential provider (
copilot-authfeature) — use your Copilot subscription as an LLM backend viagithub_copilot/model prefix (#12) - GitHub Copilot provider with OpenAI-compatible routing, required Copilot headers, per-request UUID, and
X-Initiatorheader - E2E test fixtures for GitHub Copilot provider (chat + auth error)
- Provider registry audit: corrected base URLs for 20 providers (aiml, assemblyai, clarifai, dashscope, deepseek, elevenlabs, firecrawl, friendliai, gradient_ai, gmi, helicone, lambda_ai, minimax, moonshot, morph, nlp_cloud, ollama, poe, stability, wandb)
- Provider registry audit: corrected env var names for 5 providers (cometapi, fal_ai, gradient_ai, jina_ai, venice)
- Provider registry audit: corrected endpoint lists for 6 providers (cometapi, deepinfra, elevenlabs, jina_ai, mistral, nvidia_nim)
- Added missing
base_urlandauthconfig for 11 previously non-functional providers (amazon_nova, baseten, compactifai, datarobot, docker_model_runner, duckduckgo, langgraph, lemonade, v0, vercel_ai_gateway, zai) - Added 18 stub/infrastructure providers to
complex_providerslist to prevent incorrect config-driven routing - Added
nanogptparam mapping (max_completion_tokens→max_tokens)
1.2.1 - 2026-04-17
LlmClientRawtrait with_rawvariants of allLlmClientmethods, returningRawExchange<T>that exposes the final request body and raw provider response before normalization (#13)RawExchange<T>andRawStreamExchange<S>types for wire-level debugging and custom parsing- MCP & IDE integration documentation with setup guides for VS Code, GitHub Copilot, Claude Desktop, Cursor (#12)
- Docker image now published to
ghcr.io/kreuzberg-dev/liter-llm(#11) - Docker publish workflow timeout increased from 60 to 360 minutes (multi-arch Rust builds via QEMU were timing out)
- Bedrock
build_urltests no longer flake due toBEDROCK_CROSS_REGIONenv var race condition
1.2.0 - 2026-04-07
- Local LLM provider support: Ollama, LM Studio, vLLM, llama.cpp, LocalAI, llamafile -- use any local inference engine via OpenAI-compatible API
- Docker Compose setup for local LLM integration testing with Ollama
- Integration test suite for local LLM providers
- PHP
onErrorhook now passes a proper\Exceptionobject instead of a plain string (PHP strict types requires\Throwable) - README templates fixed for rumdl compliance (MD040 code fence language, MD031 blank lines, MD032 list spacing, MD020 closed headings)
- Added 404 to all POST endpoint OpenAPI specs (model not found on default model names)
- Homebrew badge added to all READMEs
1.1.1 - 2026-03-29
- Java Maven plugins downgraded to 3.x stable (was 4.0.0-beta, incompatible with Maven 3.9.x CI)
- PHP hook isolation (per-client instead of global), budget per-model enforcement, onError hook invocation, shutdown segfault
- PHP e2e tests set
max_retries=0to prevent retry delays on mock 500s - OpenAPI spec: added 400/415/422/503 status codes to all endpoints for schemathesis compliance
first_client()returns 503 Service Unavailable instead of 500 for "no models configured"- Schemathesis CI checks aligned (removed
content_type_conformance,not_a_server_error) - Docker cache: per-platform
TARGETARCHcache IDs prevent multi-arch build races
- Homebrew formula:
brew tap kreuzberg-dev/tap && brew install liter-llm - Homebrew bottle builds (arm64_sequoia) in publish workflow
liter-llm-proxyandliter-llm-cliadded to crates.io publish pipeline- Installation docs: CLI/Docker/Homebrew tabs
scripts/publish/upload-homebrew-bottles.shandensure-github-release-exists.sh
1.1.0 - 2026-03-29
OpenAI-compatible LLM proxy server with CLI, MCP tool server, and Docker support.
- 22 REST endpoints — full OpenAI-compatible API surface: chat completions (streaming + non-streaming), embeddings, models, images, audio (speech + transcription), moderations, rerank, search, OCR, files CRUD, batches CRUD, responses CRUD, health
- Tower middleware stack — reuses core middleware: cache, rate limit, budget, cost tracking, cooldown, health check, tracing
- Virtual API keys — in-memory key store with per-key model restrictions, RPM/TPM limits, budget limits
- Model routing — name-based routing to provider deployments, wildcard aliases, deterministic default client
- OpenDAL file storage — configurable backend (memory, S3, GCS, filesystem) for file operations
- SSE streaming — chat completion chunks proxied as Server-Sent Events with
[DONE]sentinel - OpenAPI 3.1 — utoipa-generated spec served at
/openapi.jsonwith bearer auth security scheme - TOML configuration —
liter-llm-proxy.tomlwith env var interpolation (${VAR}), auto-discovery,deny_unknown_fields - CORS — configurable origins from config (default: allow all)
- Graceful shutdown — SIGINT/SIGTERM handling via
tokio::signal
- 22 tools — full parity with REST API: chat, embed, list_models, generate_image, speech, transcribe, moderate, rerank, search, ocr, file CRUD (5), batch CRUD (4), response CRUD (3)
- Transports — stdio (default) and HTTP/SSE via
StreamableHttpService - Parameter schemas —
schemars::JsonSchemaderives for MCP tool discovery
liter-llm api— start proxy server with config, host/port overrides, debug loggingliter-llm mcp— start MCP server with stdio or HTTP transport- 3-tier config precedence: CLI flags > env vars > config file > defaults
- Multi-stage build:
rust:1.91-bookwormbuilder,cgr.dev/chainguard/glibc-dynamicruntime (35MB) - Non-root execution, OCI labels, port 4000 exposed
ENTRYPOINT ["liter-llm"],CMD ["api", "--host", "0.0.0.0", "--port", "4000"]
- 74 unit tests — config parsing, error mapping, auth key store, service pool, file store, streaming
- 32 integration tests — auth middleware, chat/embedding/models routes, error propagation, CORS, health, OpenAPI
- 12 proxy e2e fixtures — chat (basic + streaming), embeddings, models, auth errors, upstream errors, health, images, moderation, reranking
- Schemathesis — contract testing against OpenAPI spec via Docker (
task proxy:schemathesis)
.github/workflows/ci-docker.yaml— build + health test + schemathesis contract tests.github/workflows/publish-docker.yaml— multi-arch (amd64/arm64) publish toghcr.io/kreuzberg-dev/liter-llm- Taskfile:
proxy:test,proxy:schemathesis
1.0.0 - 2026-03-28
Initial stable release. Universal LLM API client with native bindings for 11 languages and 142+ providers.
LlmClienttrait with chat, chat_stream, embed, list_models, image_generate, speech, transcribe, moderate, rerank, search, ocrFileClient,BatchClient,ResponseClienttraits for file/batch/response operationsDefaultClientwith reqwest + tokio, SSE streaming, retry with exponential backoffManagedClientwith composable Tower middleware stack- 142 LLM providers embedded at compile time from
schemas/providers.json - Per-request provider routing from model name prefix (e.g.
anthropic/claude-sonnet-4-20250514) secrecy::SecretStringfor API keys (zeroized on drop, never logged)- TOML configuration file loading with auto-discovery (
liter-llm.toml) - Custom provider registration at runtime
- CacheLayer — in-memory LRU + pluggable backends via
CacheStoretrait - OpenDAL cache — 40+ storage backends (Redis, S3, GCS, filesystem, etc.) via Apache OpenDAL
- BudgetLayer — global + per-model spending limits with hard/soft enforcement
- HooksLayer — request/response/error lifecycle callbacks with guardrail pattern
- CooldownLayer — circuit breaker after transient errors
- ModelRateLimitLayer — per-model RPM/TPM rate limiting
- HealthCheckLayer — background health probing
- CostTrackingLayer — per-request cost calculation from embedded pricing registry
- TracingLayer — OpenTelemetry GenAI semantic convention spans
- FallbackLayer — automatic failover to backup provider
- RouterLayer — multi-deployment load balancing (round-robin, latency, cost, weighted)
All bindings expose the full API surface with language-idiomatic conventions:
- Python (PyO3) — async/await, typed kwargs, full .pyi stubs
- TypeScript / Node.js (NAPI-RS) — camelCase, .d.ts types, Promise-based
- Rust — native, zero-cost
- Go (cgo) — FFI wrapper with build tags,
context.Contextsupport - Java (Panama FFM) — JDK 25+,
AutoCloseable, builder pattern - C# / .NET (P/Invoke) — async/await,
IAsyncEnumerablestreaming,IDisposable - Ruby (Magnus) — RBS type signatures, Enumerator streaming
- Elixir (Rustler NIF) —
{:ok, result}tuples, OTP-compatible - PHP (ext-php-rs) — PHP 8.2+, JSON in/out, PIE packages
- WebAssembly (wasm-bindgen) — browser + Node.js, Fetch API
- C / FFI (cbindgen) —
extern "C"with opaque handles
- Static API keys (Bearer, x-api-key)
- Azure AD OAuth2 client credentials
- Vertex AI service account JWT
- AWS STS Web Identity (EKS/IRSA)
- AWS SigV4 signing for Bedrock
- Anthropic: message format, tool use v1, thinking blocks, max_tokens default
- AWS Bedrock: Converse API, EventStream binary framing, cross-region routing
- Vertex AI: Gemini format, embedding
:predictendpoint - Google AI: embedding/list_models response transforms
- Cohere: citation handling
- Mistral: API compatibility
param_mappingsfor config-driven field renaming (8 providers)
- MkDocs Material site at docs.liter-llm.kreuzberg.dev
- 170+ code snippets across 10 languages
- 11 API reference docs with full method coverage
- Usage pages: Chat & Streaming, Embeddings & Rerank, Media, Search & OCR, Files & Batches, Configuration
- TOML configuration reference
- llms.txt (218 lines) with capabilities, examples, provider list
- Skills directory (4,072 lines) for Claude Code integration
- README generation from Jinja templates via
scripts/generate_readme.py
- 500+ unit and integration tests
- Middleware stack composition tests (cache + budget + hooks + rate limit + cooldown)
- Per-request provider routing tests
- File/batch/response CRUD operation tests
- Concurrency tests (budget atomicity, cache contention, rate limit fairness)
- Redis cache backend integration tests (Docker Compose)
- Live provider tests for 7 providers (OpenAI, Anthropic, Google AI, Vertex AI, Mistral, Azure, Bedrock)
- Smoke test apps for all 10 languages against real APIs
- E2E test generation from JSON fixtures across all languages
- Contract test fixtures for binding API parity
- Multi-platform publish pipeline: crates.io, PyPI, npm, RubyGems, Hex.pm, Maven Central, NuGet, Packagist, Go FFI, PHP PIE
- Pre-commit hooks: 43 linters across all languages
- Post-generation formatting in e2e-generator
- Version sync script across 27+ manifests with README regeneration
Release candidate history (rc.1 through rc.9)
- rc.1 (2026-03-27): Initial release — core crate, 11 bindings, e2e generator
- rc.2 (2026-03-27): Packaging fixes for crates.io, RubyGems, Elixir NIF, Node NAPI, publish workflow
- rc.3 (2026-03-27): Cache, budget, hooks middleware; custom providers; TDD e2e fixtures
- rc.4 (2026-03-28): Shared bindings-core crate; camelCase conversion; real streaming across all bindings
- rc.5 (2026-03-28): OpenDAL cache; search/OCR endpoints; full middleware wiring; Go/Java/C# FFI rewrites; serde deny_unknown_fields; documentation overhaul
- rc.6 (2026-03-28): Full API documentation coverage; Rust crate README; version sync improvements
- rc.7 (2026-03-28): Binding parity (5 middleware params + search/ocr in all 10); contract test fixtures; skills directory; PHP PIE packages
- rc.8 (2026-03-28): CI fixes (PHP publish, crate order, Maven GPG, Ruby deps, Bedrock test)
- rc.9 (2026-03-28): Live provider tests; Anthropic/Bedrock/Google streaming fixes; TOML config loading; per-request provider routing; integration test suite