lago-agent-sdk

Instrument LLM clients and emit usage events to Lago for billing. Authored in TypeScript, ships compiled JavaScript with .d.ts — works for both JS and TS consumers.

                  ┌──────────────┐
your code ──────► │ wrapped client│ ──► provider (Bedrock / Mistral / …)
                  └──────┬───────┘
                         │ (extract usage)
                         ▼
                  ┌──────────────┐
                  │  Lago events │ ──► api.getlago.com
                  └──────────────┘

What it does

Wraps your existing LLM client in place — no API surface change for your application code.
Extracts usage from each response into a normalized shape (CanonicalUsage).
Buffers events in memory, flushes them in batches to Lago's /events/batch endpoint.
Survives provider/Lago outages with exponential backoff and a bounded buffer.
p99 wrap-overhead under 5 ms — your call is never blocked on Lago.

Install

npm install lago-agent-sdk
# plus the provider SDK(s) you use:
npm install @aws-sdk/client-bedrock-runtime
npm install @anthropic-ai/sdk
npm install @mistralai/mistralai
npm install openai
npm install @google/genai

Quickstart — Bedrock

import { BedrockRuntimeClient, ConverseCommand } from "@aws-sdk/client-bedrock-runtime";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({
  apiKey: process.env.LAGO_API_KEY!,
  defaultSubscriptionId: "sub_acme",
});
const client = sdk.wrap(new BedrockRuntimeClient({ region: "eu-west-1" }));

await client.send(new ConverseCommand({
  modelId: "eu.amazon.nova-lite-v1:0",
  messages: [{ role: "user", content: [{ text: "Hello" }] }],
}));
await sdk.flush();

The wrapped client behaves identically to the original — same arguments, same return shape, same exceptions. The SDK adds an in-memory queue that batches events to Lago in the background.

Quickstart — Anthropic

import Anthropic from "@anthropic-ai/sdk";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }));

await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 200,
  messages: [{ role: "user", content: "Hello" }],
});
await sdk.flush();

Both messages.create({ ..., stream: true }) and the messages.stream(...) helper (with .finalMessage()) are instrumented automatically.

Quickstart — Mistral

import { Mistral } from "@mistralai/mistralai";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new Mistral({ apiKey: process.env.MISTRAL_API_KEY! }));

await client.chat.complete({
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Hello" }],
});
await sdk.flush();

Quickstart — OpenAI

import OpenAI from "openai";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new OpenAI({ apiKey: process.env.OPENAI_API_KEY! }));

await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
  max_completion_tokens: 200,
});
await sdk.flush();

Covers both Chat Completions (client.chat.completions.create) and the newer Responses API (client.responses.create), sync + streaming. For Chat Completions streaming, the wrapper auto-injects stream_options: { include_usage: true } so the final chunk carries usage data — without it OpenAI emits no usage on streamed responses.

Reasoning tokens (llm_reasoning_tokens) populate automatically when you call an o-series model (o4-mini, o1, etc.) — OpenAI is the first provider to expose this metric separately.

Quickstart — Gemini

import { GoogleGenAI } from "@google/genai";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! }));

await client.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Hello",
});
await sdk.flush();

Wraps the modern @google/genai SDK. Covers client.models.generateContent + generateContentStream, sync + streaming. Reads usage from response.usageMetadata (both camelCase and snake_case forms supported).

Reasoning tokens populate automatically on Gemini 2.5 — the model reasons internally by default and surfaces thoughtsTokenCount. Note the semantic difference vs OpenAI:

OpenAI: reasoning_tokens is a subset of completion_tokens (already counted in output)
Gemini: thoughtsTokenCount is additive to candidatesTokenCount (total Google bill = output + reasoning)

Multi-tenant — pick a subscription per call

Three ways to set the external_subscription_id, in priority order:

// 1. Per-call override — attach __lago to a Bedrock command, or pass `lago: {...}` on a Mistral call.
const cmd = new ConverseCommand({...});
(cmd as any).__lago = { subscription: "sub_acme", dimensions: { feature: "summarize" } };
await client.send(cmd);

// 2. Context-bound — uses AsyncLocalStorage; safe across `await` boundaries.
sdk.withSubscription("sub_acme", async () => {
  await client.send(...);  // bills sub_acme
});
// or at the top of a request handler:
sdk.setSubscription("sub_acme");

// 3. Default at init (fallback)
new LagoSDK({ apiKey: "...", defaultSubscriptionId: "sub_default" });

Backed by Node's AsyncLocalStorage for safe propagation across promises.

Supported providers

Provider	Access	Status
AWS Bedrock	`ConverseCommand` (sync + stream)	✓
AWS Bedrock	`InvokeModelCommand` (sync + stream), 7 model families	✓
Anthropic	`@anthropic-ai/sdk` (`messages.create` sync + stream, `messages.stream`)	✓
Mistral	`@mistralai/mistralai` (`chat.complete` + `chat.stream`)	✓
OpenAI	`openai` (`chat.completions.create` + `responses.create`, sync + async + stream)	✓
Google Gemini	`@google/genai` (`models.generateContent` + `generateContentStream`, sync + stream)	✓
Vercel AI SDK	`wrapLanguageModel` middleware	Phase 4

Token dimensions captured

CanonicalUsage carries 11 numeric fields. Which ones populate depends on the provider:

Field	Lago metric code	Bedrock	Anthropic	Mistral	OpenAI	Gemini
input	`llm_input_tokens`	✓	✓	✓	✓	✓
output	`llm_output_tokens`	✓	✓	✓	✓	✓
cache_read	`llm_cached_input_tokens`	✓ (Anthropic)	✓	✓ (when cache hits)	✓ (auto-cache)	✓ (CachedContent API)
cache_write	`llm_cache_creation_tokens`	✓ (Anthropic)	✓	✗	✗	✗
cache_write_5m / 1h	`llm_cache_write_5m/1h_tokens`	✓ (Anthropic InvokeModel)	✓	✗	✗	✗
reasoning	`llm_reasoning_tokens`	✗ (folded into output)	✗ (folded into output)	✗ (folded into output)	✓ (o-series, subset)	✓ (Gemini 2.5, additive)
tool_calls	`llm_tool_calls`	✓	✓	✓	✓	✓
audio_input	`llm_audio_input_tokens`	✗	✗	✗	✓ (GPT-4o-audio)	✓ (multimodal AUDIO)
audio_output	`llm_audio_output_tokens`	✗	✗	✗	✓ (GPT-4o-audio)	✓ (multimodal AUDIO)
image_input	`llm_image_input_tokens`	✗	✗	✗	✗ (Phase 3)	✓ (multimodal IMAGE)

Semantic note on reasoning:

OpenAI's reasoning_tokens is a SUBSET of output — already counted in completion_tokens.
Gemini's thoughtsTokenCount is ADDITIVE to output — candidates + thoughts = total billable output.

Semantic note on input breakdowns (avoid double-counting): For both OpenAI and Gemini, cache_read, audio_input, and image_input are subsets of input, not additive to it — they are a breakdown of tokens already counted in llm_input_tokens. For example, OpenAI reports cached_tokens under prompt_tokens_details within prompt_tokens, and Gemini's docs state promptTokenCount "includes the number of tokens in the cached content". A billable metric that sums llm_input_tokens + llm_cached_input_tokens (or + llm_audio_input_tokens, + llm_image_input_tokens) will double-count. Bill on llm_input_tokens as the total; use the breakdown fields only for cost attribution or discounted-rate tiers (e.g. cached input billed at a lower rate), subtracting them from input rather than adding.

Error policy

The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.

Wire your own observability via onError:

new LagoSDK({
  apiKey: "...",
  config: {
    onError: (err, where) => Sentry.captureException(err, { tags: { sdk_phase: where } }),
  },
});

Setting up Lago

The SDK ships with default metric codes (llm_input_tokens, llm_output_tokens, etc.). You need to register matching billable metrics in your Lago tenant before events count toward charges. See Lago docs — Billable Metrics.

Development

git clone https://github.com/getlago/lago-agent-sdk-js
cd lago-agent-sdk-js
npm install
npm test
npm run build

Run live integration tests (requires real credentials):

AWS_BEARER_TOKEN_BEDROCK="..." \
MISTRAL_API_KEY="..." \
LAGO_API_URL="https://api.getlago.com/api/v1/" \
LAGO_API_KEY="..." \
LAGO_EXTERNAL_SUBSCRIPTION_ID="sub_..." \
npm test -- tests/integration

Security

Found a vulnerability? See SECURITY.md.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
src		src
tests		tests
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lago-agent-sdk

What it does

Install

Quickstart — Bedrock

Quickstart — Anthropic

Quickstart — Mistral

Quickstart — OpenAI

Quickstart — Gemini

Multi-tenant — pick a subscription per call

Supported providers

Token dimensions captured

Error policy

Setting up Lago

Development

Security

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lago-agent-sdk

What it does

Install

Quickstart — Bedrock

Quickstart — Anthropic

Quickstart — Mistral

Quickstart — OpenAI

Quickstart — Gemini

Multi-tenant — pick a subscription per call

Supported providers

Token dimensions captured

Error policy

Setting up Lago

Development

Security

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages