All notable changes to this project will be documented here. Format follows Keep a Changelog; versions follow SemVer.
- Price mode — emit computed dollar cost instead of token counts. New
pricingModeconfig ("tokens"default |"price"), plusmarkup,costMetricCode(defaultllm_cost),pricingTtlMs, andbedrockDefaultRegion. In price mode the SDK emits onellm_costevent per call carrying a top-levelprecise_total_amount_cents(cost in cents, after markup) for Lago's dynamic charge model, with a full per-field breakdown inproperties(value in USD, base, markup, source, per-field tokens/unit_price/cost). Live unit prices come from public, no-auth sources: OpenRouter (/api/v1/models) for native anthropic/openai/mistral/gemini, and the AWS Bedrock Price List Bulk API for Bedrock. Prices are fetched + cached on the background queue loop (never blocking the customer's call); a missing price falls back to token events and callsonError(never silently under-bills). Mode and markup are overridable per-call vialago: { mode: "price", markup: 1.5 }(Bedrock: command__lago). Money uses fixed-point BigInt floored to 12 dp, identical to the PythonDecimalimplementation (cross-repo golden fixture). Newpricing.tsmodule +PricingProvider; defaultpricingMode: "tokens"keeps existing behavior unchanged.
- Anthropic
messages.create({ stream: true })under-billed input tokens. The stream wrapper read only top-levelusage, which on a basic stream appears only onmessage_deltaas{ output_tokens: N }— the authoritativeinput_tokens/cache_*counts arrive nested undermessage.usageon themessage_startevent and were ignored, so input billed 0. The wrapper now merges usage frommessage_start(input/cache) andmessage_delta(cumulative output). Regression test uses the realistic wire shape (delta carries no input echo). - Legacy
@google/generative-aiSDK silently emitted no events. The detector matched both the new@google/genai(GoogleGenAI) and the deprecated@google/generative-ai(GoogleGenerativeAI) SDKs, but the wrapper only instruments the unifiedmodels/aiosurface — a legacy client routed through and wrapped nothing.wrap()now rejects legacy clients with a clear pointer to migrate to@google/genai.
- Hardened the publish workflow: least-privilege
permissions: contents: readdefault (onlypublishgetsid-token: write, onlyreleasegetscontents: write), and every third-party action pinned to a full commit SHA so a re-pointed tag can't inject code into the OIDC-token-minting job. - The
publishjob builds from source (npm ci+npm run build) and publishes with--provenance, attaching a sigstore attestation ("Built and signed on GitHub Actions") to the package on npm. (npm has no supported path to attach provenance to a pre-packed tarball — provenance is bound to the build — so the job reinstalls from the committed lockfile, which keeps the build reproducible, and runs only on av*.*.*tag behind the environment approval gate.) - The
publishjob runs on Node 24 (bundles npm ≥ 11.13). OIDC trusted publishing requires npm CLI ≥ 11.5.1, which Node 20/22 (npm 10.x) do not ship — the previous Node 20 publish job would have failed the OIDC handshake at release time. - Added
if: startsWith(github.ref, 'refs/tags/v')to thepublishjob as defense-in-depth — it refuses to run on a non-tag ref even if the environment's protected-tag rule is misconfigured. - Added
.github/dependabot.yml(github-actions ecosystem) so the SHA pins stay fresh — Dependabot bumps the SHA and version comment together rather than letting actions silently age. - RELEASING.md now documents
npmenvironment protection (required reviewers + protected-tag restriction) as a required setup step, not optional, since trusted publishing is only as strong as that environment's rules.
- README: clarified that
cache_read,audio_input, andimage_inputare subsets ofinputfor OpenAI and Gemini (not additive) — summing them withllm_input_tokensdouble-counts.
- Native
@google/genaiSDK wrapper coveringclient.models.generateContent+generateContentStream, sync + streaming. Handles both camelCase (SDK pydantic-like objects) and snake_case (serialized JSON) shapes ofusageMetadata/usage_metadata. extractGeminiNativeadapter:promptTokenCount → input,candidatesTokenCount → output,cachedContentTokenCount → cache_read,thoughtsTokenCount → reasoning, modality-tagged details → audio_input/audio_output/image_input, count ofcandidates[0].content.parts[].functionCall → tool_calls.- Gemini 2.5 surfaces reasoning tokens by default — fires
llm_reasoning_tokensautomatically. Semantic note vs OpenAI: Gemini's reasoning is ADDITIVE to output (candidates + thoughts = total billable output); OpenAI's reasoning is a SUBSET ofcompletion_tokens. Documented in adapter docstring + README. - 20 new unit tests (14 adapter + 6 wrapper) and 4 live integration tests (gated on
GEMINI_API_KEY). Total: 291 unit tests. - 5 captured response fixtures from the real Gemini API.
- Detector now returns
gemini(wasgoogle) for@google/genaiclients.
- Native
openaiSDK wrapper covering both APIs:chat.completions.createandresponses.create, each sync + streaming. Wraps the APIPromise via Proxy with.bind(target)to preserve.withResponse()/.asResponse()calls. extractOpenAINativeadapter auto-detects which API (Chat Completions vs Responses) and extracts the appropriate fields:- Chat Completions:
prompt_tokens,completion_tokens,prompt_tokens_details.{cached_tokens, audio_tokens},completion_tokens_details.{reasoning_tokens, audio_tokens}, count ofchoices[0].message.tool_calls. - Responses API:
input_tokens,output_tokens,input_tokens_details.cached_tokens,output_tokens_details.reasoning_tokens, count ofoutput[].type === "function_call".
- Chat Completions:
- First provider to populate
llm_reasoning_tokens— OpenAI's o-series models (o4-mini,o1, etc.) surface reasoning tokens separately from completion tokens. - Auto-injection of
stream_options: { include_usage: true }whenstream: trueis set without it, so Chat Completions streaming emits usage on the final chunk. audio_outputfield added toCanonicalUsage(maps tollm_audio_output_tokens) — populated by GPT-4o-audio responses.- Per-call override via
lago: { subscription, dimensions }on the OpenAI options. - 19 adapter tests + 9 wrapper tests + 5 live integration tests.
- 10 captured response fixtures from the real OpenAI API.
- Native
@anthropic-ai/sdkwrapper coveringmessages.create(sync + streaming) andmessages.stream(.finalMessage()+finalMessageevent). extractAnthropicNativeadapter — verified against captured fixtures (plain, tool use, cache create). Mapsusage.input_tokens,usage.output_tokens,usage.cache_read_input_tokens,usage.cache_creation_input_tokens,usage.cache_creation.ephemeral_{5m,1h}_input_tokens, and countscontent[].type === "tool_use"fortool_calls.- Per-call override via
lago: { subscription, dimensions }in the create/stream options — stripped before forwarding so the Anthropic validator doesn't reject it. - 6 wrapper tests + 7 adapter tests + 3 live integration tests.
LagoSDKcore with batched async event queue, exponential backoff, bounded buffer,AsyncLocalStorage-based subscription resolution.- AWS SDK v3
BedrockRuntimeClientwrapper coveringConverseCommand,ConverseStreamCommand,InvokeModelCommand,InvokeModelWithResponseStreamCommand. - 7 InvokeModel family adapters (
anthropic,opus_4_7,nova,pixtral,mistral_legacy,openai_compat_basic,openai_compat_with_details) with substring-match dispatch. @mistralai/mistralainative wrapper coveringchat.complete,chat.stream, and async variants. Handles both snake_case and camelCase usage payloads.- Three subscription-resolution tiers: per-call
__lagoon commands /lagoon Mistral options, context-boundwithSubscription/setSubscription, init-time default. - 237 tests: 229 unit + 8 integration; verified against 159 fixtures captured from real provider responses.
- p99 wrap-overhead ≤ 5 ms benchmark.