langfuse-proxy

A transparent proxy that forwards API requests to upstream LLM providers and sends telemetry to Langfuse in the background. Zero latency overhead on the response path.

Supports OpenAI, Anthropic, and Google Gemini APIs natively.

Architecture

                              +--> Upstream OpenAI    (/v1/*)
Consumer  -->  Proxy  --------+--> Upstream Anthropic (/v1/messages)
                |             +--> Upstream Gemini    (/v1beta/*)
                |
                v (background, non-blocking)
             Langfuse

How it works:

Consumer sends a standard API request to the proxy
Proxy forwards it to the appropriate upstream provider
Upstream response stream is split via ReadableStream.tee() — one branch goes to the consumer immediately, the other is consumed in the background for telemetry
Langfuse receives a trace with full input/output, model, token usage, TTFB, and total duration

Key features:

Multi-provider — native support for OpenAI, Anthropic, and Gemini APIs with provider-specific stream parsing and telemetry
Passthrough auth — consumers send their own API key, proxy forwards it upstream. No user management.
OpenAI catch-all — ALL /v1/* forwards any OpenAI-compatible request. Chat completions, embeddings, audio, images, assistants — all work automatically.
Streaming support — SSE streams are split and returned immediately. For OpenAI, the proxy injects stream_options.include_usage so Langfuse always gets token counts.
Full telemetry — every request is logged to Langfuse with input messages, output content, model, full token usage breakdown, TTFB, and total duration.
Optional auth gate — set PROXY_API_KEY to require consumers to authenticate with the proxy itself (timing-safe comparison).
Upstream key override — set UPSTREAM_API_KEY / ANTHROPIC_API_KEY / GEMINI_API_KEY to use a single key for all upstream requests regardless of what consumers send.
Graceful shutdown — SIGTERM/SIGINT stops accepting connections, waits for in-flight requests, and flushes Langfuse before exiting.

Getting Started

Prerequisites: Bun v1.0+

# Install dependencies
bun install

# Configure environment
cp .env.example .env

Edit .env with your settings. At minimum, configure Langfuse credentials to enable telemetry:

LANGFUSE_BASE_URL=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...

Start the server:

# Development (hot reload)
bun dev

# Production
bun start

Usage

OpenAI

Point any OpenAI-compatible SDK at the proxy:

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="sk-your-openai-key",  # forwarded to upstream
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)

TypeScript / Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3000/v1",
  apiKey: "sk-your-openai-key",
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

Anthropic

Use the Anthropic SDK pointed at the proxy:

Python

from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:3000",
    api_key="sk-ant-your-key",  # forwarded to upstream
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

TypeScript / Node.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "http://localhost:3000",
  apiKey: "sk-ant-your-key",
});

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello!" }],
});

Gemini

Send requests to the /v1beta/* endpoints:

curl "http://localhost:3000/v1beta/models/gemini-2.0-flash:generateContent" \
  -H "x-goog-api-key: your-gemini-key" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"parts":[{"text":"Hello!"}]}]}'

curl (OpenAI)

# Non-streaming
curl http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello!"}]}'

# Streaming
curl http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Hello!"}]}'

Health check

curl http://localhost:3000/api/health

Endpoints

Endpoint	Description
`ALL /v1/messages`	Anthropic pass-through — forwards to Anthropic API
`ALL /v1beta/*`	Gemini pass-through — forwards to Gemini API
`ALL /v1/*`	OpenAI catch-all — forwards any request to upstream provider
`GET /api/health`	Health check — returns app version and per-provider reachability

Routes are matched in order: /v1/messages is matched before the /v1/* catch-all, so Anthropic requests are routed correctly.

The health endpoint returns per-provider status:

{
  "name": "langfuse-proxy",
  "version": "0.0.0",
  "status": "ok",
  "upstream": {
    "openai": "ok",
    "anthropic": "ok",
    "gemini": "not_configured"
  }
}

status is "degraded" if OpenAI is unreachable or any configured provider has errors
Anthropic and Gemini show "not_configured" if their API key is not set

Langfuse Telemetry

Every proxied request creates a Langfuse trace with:

Trace: request path, input messages, output content, HTTP metadata
Generation: model name, full input/output, token usage with detailed breakdowns, timing

The usageDetails field includes the full OpenAI token breakdown:

Field	Description
`input`	Non-cached prompt tokens
`input_cached_tokens`	Prompt tokens served from OpenAI's cache
`input_audio_tokens`	Audio input tokens
`output`	Completion tokens
`output_reasoning_tokens`	Reasoning/chain-of-thought tokens (o1, etc.)
`output_audio_tokens`	Audio output tokens

Anthropic and Gemini providers report their native token usage in the same format.

Timing metadata on each generation:

Field	Description
`startTime`	When the proxy received the request
`completionStartTime`	When the first byte was received from upstream (TTFB)
`endTime`	When the full response was consumed

Set TELEMETRY_MAX_BODY_BYTES to limit how much response data is buffered for telemetry (default 1MB). The consumer always gets the full response regardless of this limit.

Leave LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY empty to disable telemetry entirely.

Environment Variables

Variable	Description	Default
`NODE_ENV`	Environment mode	`development`
`PORT`	Server port	`3000`
`LOG_LEVEL`	Pino log level (`debug`, `info`, `warn`, `error`, `silent`)	`info`
OpenAI / catch-all
`UPSTREAM_BASE_URL`	Upstream LLM provider base URL	`https://api.openai.com`
`UPSTREAM_API_KEY`	Override consumer's key for upstream (optional)	-
`PROXY_API_KEY`	Gate consumers with this key (optional)	-
`PROXY_TIMEOUT_MS`	Upstream request timeout in ms	`300000` (5 min)
`TELEMETRY_MAX_BODY_BYTES`	Max response body to buffer for telemetry	`1048576` (1MB)
Anthropic
`ANTHROPIC_BASE_URL`	Anthropic API base URL	`https://api.anthropic.com`
`ANTHROPIC_API_KEY`	Override consumer's key for Anthropic (optional)	-
`ANTHROPIC_VERSION`	Default `anthropic-version` header	`2023-06-01`
Gemini
`GEMINI_BASE_URL`	Gemini API base URL	`https://generativelanguage.googleapis.com`
`GEMINI_API_KEY`	Override consumer's key for Gemini (optional)	-
Langfuse
`LANGFUSE_BASE_URL`	Langfuse instance URL	`https://cloud.langfuse.com`
`LANGFUSE_PUBLIC_KEY`	Langfuse public key (empty = telemetry disabled)	-
`LANGFUSE_SECRET_KEY`	Langfuse secret key (empty = telemetry disabled)	-

Deployment

Docker

docker build -t langfuse-proxy .
docker run -p 3000:3000 --env-file .env langfuse-proxy

The Dockerfile uses a multi-stage build that compiles the app to a standalone binary (~50MB image).

Coolify

Deploy using the Dockerfile build pack — configure environment variables in the Coolify dashboard. Set the health check to /api/health on port 3000 for rolling updates. No database or external services required.

Development

bun install       # Install dependencies
bun dev           # Start with hot reload
bun test          # Run tests with coverage
bun lint          # Lint with Biome
bun format        # Auto-fix lint and formatting
bun check         # Lint + type-check + tests (runs in pre-commit hook)

Project Structure

src/
├── api/
│   ├── features/
│   │   ├── anthropic/                 # ALL /v1/messages
│   │   │   ├── anthropic.controller.ts    Anthropic handler, auth, header forwarding
│   │   │   └── anthropic.stream.ts        Anthropic SSE parsing
│   │   ├── gemini/                    # ALL /v1beta/*
│   │   │   ├── gemini.controller.ts       Gemini handler, API key forwarding
│   │   │   └── gemini.stream.ts           Gemini stream parsing
│   │   ├── health/                    # GET /api/health
│   │   │   └── health.controller.ts       Per-provider reachability checks
│   │   └── proxy/                     # ALL /v1/*
│   │       ├── proxy.controller.ts        Catch-all handler, auth gate, header forwarding
│   │       ├── proxy.stream.ts            Stream consumption, SSE parsing, JSON parsing
│   │       ├── proxy.telemetry.ts         Background Langfuse reporting (all providers)
│   │       └── proxy.types.ts             TypeScript interfaces
│   └── lib/
│       ├── langfuse.ts                Langfuse client singleton + shutdown
│       └── logger.ts                  Pino logger with pretty-print (dev) / JSON (prod)
├── app.ts                             Elysia app setup (logging, error handling, routes)
├── config.ts                          Environment configuration
└── index.ts                           Entry point, server startup, graceful shutdown
tests/
└── api/features/
    ├── anthropic/                     Anthropic controller and stream parser tests
    ├── gemini/                        Gemini controller and stream parser tests
    ├── health/                        Health endpoint tests
    └── proxy/                         Proxy controller and stream parser tests

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.claude		.claude
.github		.github
.husky		.husky
.vscode		.vscode
docs		docs
src		src
tests		tests
tmp		tmp
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
bun-env.d.ts		bun-env.d.ts
bun.lock		bun.lock
bunfig.toml		bunfig.toml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langfuse-proxy

Architecture

Getting Started

Usage

OpenAI

Python

TypeScript / Node.js

Anthropic

Python

TypeScript / Node.js

Gemini

curl (OpenAI)

Health check

Endpoints

Langfuse Telemetry

Environment Variables

Deployment

Docker

Coolify

Development

Project Structure

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

langfuse-proxy

Architecture

Getting Started

Usage

OpenAI

Python

TypeScript / Node.js

Anthropic

Python

TypeScript / Node.js

Gemini

curl (OpenAI)

Health check

Endpoints

Langfuse Telemetry

Environment Variables

Deployment

Docker

Coolify

Development

Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages