Skip to content

getlago/lago-agent-sdk-js

Repository files navigation

lago-agent-sdk

Instrument LLM clients and emit usage events to Lago for billing. Authored in TypeScript, ships compiled JavaScript with .d.ts — works for both JS and TS consumers.

                  ┌──────────────┐
your code ──────► │ wrapped client│ ──► provider (Bedrock / Mistral / …)
                  └──────┬───────┘
                         │ (extract usage)
                         ▼
                  ┌──────────────┐
                  │  Lago events │ ──► api.getlago.com
                  └──────────────┘

What it does

  • Wraps your existing LLM client in place — no API surface change for your application code.
  • Extracts usage from each response into a normalized shape (CanonicalUsage).
  • Buffers events in memory, flushes them in batches to Lago's /events/batch endpoint.
  • Survives provider/Lago outages with exponential backoff and a bounded buffer.
  • p99 wrap-overhead under 5 ms — your call is never blocked on Lago.

Install

npm install lago-agent-sdk
# plus the provider SDK(s) you use:
npm install @aws-sdk/client-bedrock-runtime
npm install @anthropic-ai/sdk
npm install @mistralai/mistralai
npm install openai
npm install @google/genai

Quickstart — Bedrock

import { BedrockRuntimeClient, ConverseCommand } from "@aws-sdk/client-bedrock-runtime";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({
  apiKey: process.env.LAGO_API_KEY!,
  defaultSubscriptionId: "sub_acme",
});
const client = sdk.wrap(new BedrockRuntimeClient({ region: "eu-west-1" }));

await client.send(new ConverseCommand({
  modelId: "eu.amazon.nova-lite-v1:0",
  messages: [{ role: "user", content: [{ text: "Hello" }] }],
}));
await sdk.flush();

The wrapped client behaves identically to the original — same arguments, same return shape, same exceptions. The SDK adds an in-memory queue that batches events to Lago in the background.

Quickstart — Anthropic

import Anthropic from "@anthropic-ai/sdk";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }));

await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 200,
  messages: [{ role: "user", content: "Hello" }],
});
await sdk.flush();

Both messages.create({ ..., stream: true }) and the messages.stream(...) helper (with .finalMessage()) are instrumented automatically.

Quickstart — Mistral

import { Mistral } from "@mistralai/mistralai";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new Mistral({ apiKey: process.env.MISTRAL_API_KEY! }));

await client.chat.complete({
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Hello" }],
});
await sdk.flush();

Quickstart — OpenAI

import OpenAI from "openai";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new OpenAI({ apiKey: process.env.OPENAI_API_KEY! }));

await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
  max_completion_tokens: 200,
});
await sdk.flush();

Covers both Chat Completions (client.chat.completions.create) and the newer Responses API (client.responses.create), sync + streaming. For Chat Completions streaming, the wrapper auto-injects stream_options: { include_usage: true } so the final chunk carries usage data — without it OpenAI emits no usage on streamed responses.

Reasoning tokens (llm_reasoning_tokens) populate automatically when you call an o-series model (o4-mini, o1, etc.) — OpenAI is the first provider to expose this metric separately.

Quickstart — Gemini

import { GoogleGenAI } from "@google/genai";
import { LagoSDK } from "lago-agent-sdk";

const sdk = new LagoSDK({ apiKey: process.env.LAGO_API_KEY!, defaultSubscriptionId: "sub_acme" });
const client = sdk.wrap(new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! }));

await client.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Hello",
});
await sdk.flush();

Wraps the modern @google/genai SDK. Covers client.models.generateContent + generateContentStream, sync + streaming. Reads usage from response.usageMetadata (both camelCase and snake_case forms supported).

Reasoning tokens populate automatically on Gemini 2.5 — the model reasons internally by default and surfaces thoughtsTokenCount. Note the semantic difference vs OpenAI:

  • OpenAI: reasoning_tokens is a subset of completion_tokens (already counted in output)
  • Gemini: thoughtsTokenCount is additive to candidatesTokenCount (total Google bill = output + reasoning)

Multi-tenant — pick a subscription per call

Three ways to set the external_subscription_id, in priority order:

// 1. Per-call override — attach __lago to a Bedrock command, or pass `lago: {...}` on a Mistral call.
const cmd = new ConverseCommand({...});
(cmd as any).__lago = { subscription: "sub_acme", dimensions: { feature: "summarize" } };
await client.send(cmd);

// 2. Context-bound — uses AsyncLocalStorage; safe across `await` boundaries.
sdk.withSubscription("sub_acme", async () => {
  await client.send(...);  // bills sub_acme
});
// or at the top of a request handler:
sdk.setSubscription("sub_acme");

// 3. Default at init (fallback)
new LagoSDK({ apiKey: "...", defaultSubscriptionId: "sub_default" });

Backed by Node's AsyncLocalStorage for safe propagation across promises.

Supported providers

Provider Access Status
AWS Bedrock ConverseCommand (sync + stream)
AWS Bedrock InvokeModelCommand (sync + stream), 7 model families
Anthropic @anthropic-ai/sdk (messages.create sync + stream, messages.stream)
Mistral @mistralai/mistralai (chat.complete + chat.stream)
OpenAI openai (chat.completions.create + responses.create, sync + async + stream)
Google Gemini @google/genai (models.generateContent + generateContentStream, sync + stream)
Vercel AI SDK wrapLanguageModel middleware Phase 4

Token dimensions captured

CanonicalUsage carries 11 numeric fields. Which ones populate depends on the provider:

Field Lago metric code Bedrock Anthropic Mistral OpenAI Gemini
input llm_input_tokens
output llm_output_tokens
cache_read llm_cached_input_tokens ✓ (Anthropic) ✓ (when cache hits) ✓ (auto-cache) ✓ (CachedContent API)
cache_write llm_cache_creation_tokens ✓ (Anthropic)
cache_write_5m / 1h llm_cache_write_5m/1h_tokens ✓ (Anthropic InvokeModel)
reasoning llm_reasoning_tokens ✗ (folded into output) ✗ (folded into output) ✗ (folded into output) ✓ (o-series, subset) ✓ (Gemini 2.5, additive)
tool_calls llm_tool_calls
audio_input llm_audio_input_tokens ✓ (GPT-4o-audio) ✓ (multimodal AUDIO)
audio_output llm_audio_output_tokens ✓ (GPT-4o-audio) ✓ (multimodal AUDIO)
image_input llm_image_input_tokens ✗ (Phase 3) ✓ (multimodal IMAGE)

Semantic note on reasoning:

  • OpenAI's reasoning_tokens is a SUBSET of output — already counted in completion_tokens.
  • Gemini's thoughtsTokenCount is ADDITIVE to outputcandidates + thoughts = total billable output.

Semantic note on input breakdowns (avoid double-counting): For both OpenAI and Gemini, cache_read, audio_input, and image_input are subsets of input, not additive to it — they are a breakdown of tokens already counted in llm_input_tokens. For example, OpenAI reports cached_tokens under prompt_tokens_details within prompt_tokens, and Gemini's docs state promptTokenCount "includes the number of tokens in the cached content". A billable metric that sums llm_input_tokens + llm_cached_input_tokens (or + llm_audio_input_tokens, + llm_image_input_tokens) will double-count. Bill on llm_input_tokens as the total; use the breakdown fields only for cost attribution or discounted-rate tiers (e.g. cached input billed at a lower rate), subtracting them from input rather than adding.

Error policy

The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.

Wire your own observability via onError:

new LagoSDK({
  apiKey: "...",
  config: {
    onError: (err, where) => Sentry.captureException(err, { tags: { sdk_phase: where } }),
  },
});

Setting up Lago

The SDK ships with default metric codes (llm_input_tokens, llm_output_tokens, etc.). You need to register matching billable metrics in your Lago tenant before events count toward charges. See Lago docs — Billable Metrics.

Development

git clone https://github.com/getlago/lago-agent-sdk-js
cd lago-agent-sdk-js
npm install
npm test
npm run build

Run live integration tests (requires real credentials):

AWS_BEARER_TOKEN_BEDROCK="..." \
MISTRAL_API_KEY="..." \
LAGO_API_URL="https://api.getlago.com/api/v1/" \
LAGO_API_KEY="..." \
LAGO_EXTERNAL_SUBSCRIPTION_ID="sub_..." \
npm test -- tests/integration

Security

Found a vulnerability? See SECURITY.md.

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors