All notable changes to this project will be documented here. Format follows Keep a Changelog; versions follow SemVer.
- Price mode — emit computed dollar cost instead of token counts. New
pricing_modeconfig ("tokens"default |"price"), plusmarkup,cost_metric_code(defaultllm_cost),pricing_ttl_seconds, andbedrock_default_region. In price mode the SDK emits onellm_costevent per call carrying a top-levelprecise_total_amount_cents(cost in cents, after markup) for Lago's dynamic charge model, with a full per-field breakdown inproperties(value in USD, base, markup, source, per-field tokens/unit_price/cost). Live unit prices come from public, no-auth sources: OpenRouter (/api/v1/models) for native anthropic/openai/mistral/gemini, and the AWS Bedrock Price List Bulk API for Bedrock. Prices are fetched + cached on the background queue thread (never blocking the customer's call); a missing price falls back to token events and callson_error(never silently under-bills). Mode and markup are overridable per-call viaextra_lago={"mode": "price", "markup": 1.5}. Money is computed withDecimalfloored to 12 dp, identical to the JS implementation (cross-repo golden fixture). Newpricing.pymodule +PricingProvider; defaultpricing_mode="tokens"keeps existing behavior unchanged.
- Anthropic
messages.create(stream=True)under-billed input tokens. The stream wrapper read only top-levelusage, which on a basic stream appears only onmessage_deltaas{output_tokens: N}— the authoritativeinput_tokens/cache_*counts arrive nested undermessage.usageon themessage_startevent and were ignored, so input billed 0. The wrapper now merges usage frommessage_start(input/cache) andmessage_delta(cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo). - Legacy
google-generativeaiSDK silently emitted no events. The detector matched both the newgoogle-genaiand the deprecatedgoogle-generativeaiSDKs, but the wrapper only instruments the unifiedClient.models/.aiosurface — a legacyGenerativeModelrouted through and wrapped nothing.wrap()now rejects legacy clients with a clear pointer to migrate togoogle-genai.
- Hardened the publish workflow: least-privilege
permissions: contents: readdefault (onlypublishgetsid-token: write, onlyreleasegetscontents: write), and every third-party action pinned to a full commit SHA so a re-pointed tag can't inject code into the OIDC-token-minting job. - Added
if: startsWith(github.ref, 'refs/tags/v')to thepublishjob as defense-in-depth — it refuses to run on a non-tag ref even if the environment's protected-tag rule is misconfigured. - Added
.github/dependabot.yml(github-actions ecosystem) so the SHA pins stay fresh — Dependabot bumps the SHA and version comment together rather than letting actions silently age. - RELEASING.md now documents
pypienvironment protection (required reviewers + protected-tag restriction) as a required setup step, not optional, since trusted publishing is only as strong as that environment's rules.
- README: clarified that
cache_read,audio_input, andimage_inputare subsets ofinputfor OpenAI and Gemini (not additive) — summing them withllm_input_tokensdouble-counts.
- Native
google-genaiSDK support coveringclient.models.generate_content+generate_content_stream, sync + async (client.aio.models). extract_gemini_nativeadapter mapsusage_metadata:prompt_token_count → input,candidates_token_count → output,cached_content_token_count → cache_read,thoughts_token_count → reasoning,prompt_tokens_details[modality=AUDIO/IMAGE] → audio_input/image_input,candidates_tokens_details[modality=AUDIO] → audio_output, count ofcandidates[0].content.parts[].function_call → tool_calls.- Gemini 2.5 surfaces reasoning tokens by default (
thoughts_token_count) — firesllm_reasoning_tokensautomatically. Note the semantic difference vs OpenAI: Gemini's reasoning is ADDITIVE to output (candidates + thoughts = total billable output); OpenAI's reasoning is a SUBSET ofcompletion_tokens. Documented in adapter docstring + README. geminioptional dependency group:pip install 'lago-agent-sdk[gemini]'.- 21 new unit tests (15 adapter + 6 wrapper) and 4 live integration tests (gated on
GEMINI_API_KEY). Total: 304 unit tests. - 5 captured response fixtures from the real Gemini API (plain, tool use, streaming, thinking, multi-turn).
- Detector now returns
gemini(wasgoogle) forgoogle-genaiclients.
- Native
openaiSDK support covering both APIs:chat.completions.createandresponses.create, each with sync + streaming. Same coverage onAsyncOpenAI. extract_openai_nativeadapter handles both API shapes with auto-detection:- Chat Completions:
prompt_tokens,completion_tokens,prompt_tokens_details.{cached_tokens, audio_tokens},completion_tokens_details.{reasoning_tokens, audio_tokens}, count ofchoices[0].message.tool_calls. - Responses API:
input_tokens,output_tokens,input_tokens_details.cached_tokens,output_tokens_details.reasoning_tokens, count ofoutput[].type == "function_call".
- Chat Completions:
- First provider to populate
llm_reasoning_tokens— OpenAI o-series models (o4-mini,o1, etc.) surface reasoning token counts separately. - Auto-injection of
stream_options={"include_usage": True}when the customer setsstream=Truewithout it, so streamed Chat Completions emit usage on the final chunk. audio_outputfield added toCanonicalUsage(maps tollm_audio_output_tokens), populated by GPT-4o-audio responses.openaioptional dependency group:pip install 'lago-agent-sdk[openai]'.- 27 new unit tests (18 adapter + 9 wrapper) and 5 live integration tests (gated on
OPENAI_API_KEY). Total: 283 unit tests. - 10 captured response fixtures from the real OpenAI API (plain chat, tool use, auto-caching, streaming with usage, o-series reasoning, multi-turn, Responses API plain + tool use + reasoning).
- Native
anthropicSDK support. WrapsAnthropic.messages.create(includingstream=True) andAnthropic.messages.stream(...)context manager. Same coverage onAsyncAnthropic(sync + async variants). extract_anthropic_nativeadapter with the full Anthropic field map:input_tokens,output_tokens,cache_creation_input_tokens,cache_read_input_tokens,cache_creation.ephemeral_5m_input_tokens,cache_creation.ephemeral_1h_input_tokens,content[].type == "tool_use".anthropicoptional dependency group:pip install 'lago-agent-sdk[anthropic]'.- 19 unit tests (adapter + wrapper) and 3 live integration tests (gated on
ANTHROPIC_API_KEY). - 9 captured response fixtures from the real Anthropic API (plain, tool use, 5m + 1h prompt caching, extended thinking, streaming, multi-turn).
LagoSDKcore with batched async event queue, exponential backoff, bounded buffer, async-local subscription resolution.boto3Bedrock wrapper coveringConverse,ConverseStream,InvokeModel,InvokeModelWithResponseStream.- 7 InvokeModel family adapters (
anthropic,opus_4_7,nova,pixtral,mistral_legacy,openai_compat_basic,openai_compat_with_details) with substring-match dispatch. mistralainative wrapper coveringchat.complete,chat.stream, async variants.- Three subscription-resolution tiers: per-call
extra_lago, context-boundset_subscription, init-time default. - 245 tests: 237 unit + 8 integration; verified against 159 fixtures captured from real provider responses.
- p99 wrap-overhead ≤ 5 ms benchmark.