LangChain provider packages for models served on KServe — the model inference platform for Kubernetes.
Connect any LangChain chain, agent, or multi-agent framework to self-hosted models running on Kubernetes. Works with vLLM, Triton Inference Server, TGI, and any other KServe-compatible runtime, using both the OpenAI-compatible API and the native V2 Inference Protocol.
| Package | Language | Registry | Description |
|---|---|---|---|
langchain-kserve |
Python | LangChain integration (ChatKServe, KServeLLM, KServeEmbeddings) |
|
@bitkaio/langchain-kserve |
TypeScript | LangChain.js integration (ChatKServe, KServeLLM, KServeEmbeddings) |
ChatKServe—BaseChatModelfor chat/instruct models (Qwen, Llama, Mistral, …)KServeLLM—BaseLLMfor base completion modelsKServeEmbeddings—Embeddingsfor embedding models via/v1/embeddings- Dual protocol — OpenAI-compatible (
/v1/chat/completions) and V2 Inference Protocol (/v2/models/{model}/infer), auto-detected per instance - Full streaming — sync and async, SSE for OpenAI-compat, chunked transfer for V2
- Tool calling — full OpenAI function-calling format with
tool_choice,parallel_tool_calls, andinvalid_tool_callshandling - Structured output —
with_structured_output()/withStructuredOutput()via function calling, JSON schema, or JSON mode - Vision / multimodal — send images alongside text via OpenAI content blocks
- JSON mode —
response_formatfor constrained JSON output and schema-guided generation (vLLM) - Token usage tracking —
llm_output/generationInfopopulated from vLLM, including streaming - Logprobs — optional per-token log-probabilities
- Model introspection —
get_model_info()/getModelInfo()returns unified metadata for both protocols - Production-grade — custom TLS/CA bundles, static and dynamic bearer token auth (K8s service account tokens), exponential backoff retries, generous cold-start timeouts
pip install langchain-kservefrom langchain_kserve import ChatKServe
llm = ChatKServe(
base_url="https://qwen-coder.default.svc.cluster.local",
model_name="qwen2.5-coder-32b-instruct",
temperature=0.2,
)
response = llm.invoke("Write a binary search in Python.")
print(response.content)npm install @bitkaio/langchain-kserve @langchain/coreimport { ChatKServe } from "@bitkaio/langchain-kserve";
const llm = new ChatKServe({
baseUrl: "https://qwen-coder.default.svc.cluster.local",
modelName: "qwen2.5-coder-32b-instruct",
temperature: 0.2,
});
const response = await llm.invoke("Write a binary search in TypeScript.");
console.log(response.content);Both packages auto-detect the inference protocol by probing GET /v1/models on the inference service:
| Runtime | Default Protocol |
|---|---|
| vLLM | OpenAI-compatible |
| TGI | OpenAI-compatible |
| Triton Inference Server | V2 Inference Protocol |
| Custom KServe runtime | Auto-detected |
Pin the protocol explicitly to skip auto-detection:
# Python
llm = ChatKServe(..., protocol="openai") # or "v2"// TypeScript
const llm = new ChatKServe({ ..., protocol: "openai" }); // or "v2"Both packages support static bearer tokens and dynamic token providers (e.g., Kubernetes service account tokens):
# Python — dynamic K8s SA token
import pathlib
llm = ChatKServe(
base_url="https://model.my-namespace.svc.cluster.local",
model_name="my-model",
token_provider=lambda: pathlib.Path(
"/var/run/secrets/kubernetes.io/serviceaccount/token"
).read_text().strip(),
)// TypeScript — dynamic K8s SA token
import { readFile } from "node:fs/promises";
const llm = new ChatKServe({
baseUrl: "https://model.my-namespace.svc.cluster.local",
modelName: "my-model",
tokenProvider: () =>
readFile("/var/run/secrets/kubernetes.io/serviceaccount/token", "utf-8")
.then((t) => t.trim()),
});Both packages read configuration from KSERVE_-prefixed environment variables:
| Variable | Description |
|---|---|
KSERVE_BASE_URL |
Root URL of the KServe inference service |
KSERVE_MODEL_NAME |
Model identifier as registered in KServe |
KSERVE_API_KEY |
Static bearer token |
KSERVE_PROTOCOL |
openai, v2, or auto (default) |
KSERVE_CA_BUNDLE |
Path to custom CA certificate bundle |
langchain-kserve/
├── python/ # langchain-kserve Python package
│ ├── langchain_kserve/
│ ├── tests/
│ └── pyproject.toml
└── typescript/ # @bitkaio/langchain-kserve TypeScript package
├── src/
├── tests/
└── package.json
See the package-specific READMEs for full API references and usage examples:
MIT — © bitkaio LLC