Skip to content

Commit ec249db

Browse files
committed
Address PR review + publish-pipeline audit hardening
Bugs - Anthropic messages.create(stream=True) under-billed input tokens. The stream wrapper read only the top-level `usage`, which on a basic stream appears only on message_delta as {output_tokens: N}. The authoritative input/cache counts arrive nested under message.usage on message_start and were dropped, so input billed 0. New _merge_stream_usage folds both locations (message_start input/cache + message_delta cumulative output) across sync and async paths. Fixtures now use the realistic wire shape (message_delta carries no input echo), so the stream tests are genuine regressions. - Legacy google-generativeai SDK silently emitted nothing. The detector matched both google-genai and the deprecated google-generativeai, but the wrapper only instruments the unified Client.models/.aio surface, so a legacy GenerativeModel wrapped nothing. Detector now returns a distinct 'gemini_legacy' kind and wrap() rejects it with a migrate-to-google-genai message. ("genai" is not a substring of "generativeai", so no overlap.) Docs - README: cache_read / audio_input / image_input are subsets of input for OpenAI and Gemini, not additive — summing them double-counts. Publish-workflow hardening - Least-privilege default (permissions: contents: read); only publish gets id-token: write, only release gets contents: write. - All third-party actions pinned to full commit SHAs (version in comment). - Added `if: startsWith(github.ref, 'refs/tags/v')` to the publish job as defense-in-depth. - Added .github/dependabot.yml (github-actions) to keep SHA pins fresh. - RELEASING.md documents pypi environment protection (required reviewers + protected-tag restriction) as a REQUIRED setup step. Gate: ruff + format + mypy clean; 319 unit tests pass; coverage 89.27%.
1 parent 988dc00 commit ec249db

10 files changed

Lines changed: 212 additions & 33 deletions

File tree

.github/dependabot.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
version: 2
2+
3+
# Keep the GitHub Actions in our workflows up to date. We pin every action to a
4+
# full commit SHA for supply-chain safety (see .github/workflows/publish.yml),
5+
# which means SHAs don't auto-update — Dependabot opens PRs that bump the SHA
6+
# and the trailing version comment together, so the pins stay both fixed and
7+
# fresh (and pick up upstream security patches).
8+
updates:
9+
- package-ecosystem: "github-actions"
10+
directory: "/"
11+
schedule:
12+
interval: "weekly"
13+
commit-message:
14+
prefix: "ci"
15+
groups:
16+
github-actions:
17+
patterns:
18+
- "*"

.github/workflows/publish.yml

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,29 +12,43 @@ name: publish
1212
# Workflow name: publish.yml
1313
# Environment name: pypi
1414
#
15+
# Also configure deployment protection on the `pypi` GitHub environment
16+
# (required reviewers + restrict to protected v*.*.* tags) — see RELEASING.md.
17+
#
1518
# After setup, releasing is one command from your laptop:
1619
# git tag v0.1.0 && git push --tags
20+
#
21+
# Third-party actions are pinned to full commit SHAs (not tags) so a hijacked
22+
# or re-pointed tag can't inject code into a job that mints OIDC tokens. The
23+
# trailing comment records the human-readable version each SHA corresponds to.
1724

1825
on:
1926
push:
2027
tags:
2128
- "v*.*.*"
2229

30+
# Least-privilege default for every job. Jobs that need more (OIDC, release
31+
# creation) opt in explicitly below.
32+
permissions:
33+
contents: read
34+
2335
jobs:
2436
# ----------------------------------------------------------------------
2537
# 1. Run the full CI gate first. If anything is red, abort before build.
2638
# ----------------------------------------------------------------------
2739
test:
2840
name: test (py${{ matrix.python-version }})
2941
runs-on: ubuntu-latest
42+
permissions:
43+
contents: read
3044
strategy:
3145
fail-fast: true
3246
matrix:
3347
python-version: ["3.10", "3.11", "3.12"]
3448
steps:
35-
- uses: actions/checkout@v4
49+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
3650
- name: Install uv
37-
uses: astral-sh/setup-uv@v4
51+
uses: astral-sh/setup-uv@38f3f104447c67c051c4a08e39b64a148898af3a # v4.2.0
3852
with:
3953
python-version: ${{ matrix.python-version }}
4054
enable-cache: true
@@ -52,10 +66,12 @@ jobs:
5266
name: build
5367
needs: test
5468
runs-on: ubuntu-latest
69+
permissions:
70+
contents: read
5571
steps:
56-
- uses: actions/checkout@v4
72+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
5773
- name: Install uv
58-
uses: astral-sh/setup-uv@v4
74+
uses: astral-sh/setup-uv@38f3f104447c67c051c4a08e39b64a148898af3a # v4.2.0
5975
with:
6076
python-version: "3.12"
6177
- name: Verify tag matches pyproject version
@@ -72,7 +88,7 @@ jobs:
7288
uv pip install --system build
7389
python -m build
7490
- name: Upload dist artifacts
75-
uses: actions/upload-artifact@v4
91+
uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.4.3
7692
with:
7793
name: dist
7894
path: dist/
@@ -85,22 +101,27 @@ jobs:
85101
name: publish to PyPI
86102
needs: build
87103
runs-on: ubuntu-latest
88-
# OIDC requires `id-token: write` permission on the workflow.
104+
# OIDC requires `id-token: write`; nothing else is needed here.
89105
permissions:
90106
id-token: write
107+
contents: read
108+
# Defense-in-depth: only ever publish from a v*.*.* tag ref, even if the
109+
# environment's protected-tag rule is removed or misconfigured in the UI.
110+
if: startsWith(github.ref, 'refs/tags/v')
91111
# The `pypi` environment is what you bind to in PyPI's trusted-publisher
92-
# config — restricts which workflows can claim the OIDC identity.
112+
# config — and where deployment protection rules (required reviewers,
113+
# protected-tag restriction) are enforced. See RELEASING.md.
93114
environment:
94115
name: pypi
95116
url: https://pypi.org/p/lago-agent-sdk
96117
steps:
97118
- name: Download dist
98-
uses: actions/download-artifact@v4
119+
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
99120
with:
100121
name: dist
101122
path: dist/
102123
- name: Publish
103-
uses: pypa/gh-action-pypi-publish@release/v1
124+
uses: pypa/gh-action-pypi-publish@ecb4c3dfd4790f14e30aaeac04855c7413ee9368 # v1.12.2
104125
# No `password:` — OIDC handles auth automatically.
105126

106127
# ----------------------------------------------------------------------
@@ -113,9 +134,9 @@ jobs:
113134
permissions:
114135
contents: write
115136
steps:
116-
- uses: actions/checkout@v4
137+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
117138
- name: Download dist
118-
uses: actions/download-artifact@v4
139+
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
119140
with:
120141
name: dist
121142
path: dist/

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ All notable changes to this project will be documented here. Format follows [Kee
44

55
## [Unreleased]
66

7+
### Fixed
8+
- **Anthropic `messages.create(stream=True)` under-billed input tokens.** The stream wrapper read only top-level `usage`, which on a basic stream appears only on `message_delta` as `{output_tokens: N}` — the authoritative `input_tokens` / `cache_*` counts arrive nested under `message.usage` on the `message_start` event and were ignored, so input billed 0. The wrapper now merges usage from `message_start` (input/cache) and `message_delta` (cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo).
9+
- **Legacy `google-generativeai` SDK silently emitted no events.** The detector matched both the new `google-genai` and the deprecated `google-generativeai` SDKs, but the wrapper only instruments the unified `Client.models` / `.aio` surface — a legacy `GenerativeModel` routed through and wrapped nothing. `wrap()` now rejects legacy clients with a clear pointer to migrate to `google-genai`.
10+
11+
### Security
12+
- Hardened the publish workflow: least-privilege `permissions: contents: read` default (only `publish` gets `id-token: write`, only `release` gets `contents: write`), and every third-party action pinned to a full commit SHA so a re-pointed tag can't inject code into the OIDC-token-minting job.
13+
- Added `if: startsWith(github.ref, 'refs/tags/v')` to the `publish` job as defense-in-depth — it refuses to run on a non-tag ref even if the environment's protected-tag rule is misconfigured.
14+
- Added `.github/dependabot.yml` (github-actions ecosystem) so the SHA pins stay fresh — Dependabot bumps the SHA and version comment together rather than letting actions silently age.
15+
- RELEASING.md now documents `pypi` environment protection (required reviewers + protected-tag restriction) as a **required** setup step, not optional, since trusted publishing is only as strong as that environment's rules.
16+
17+
### Documentation
18+
- README: clarified that `cache_read`, `audio_input`, and `image_input` are **subsets** of `input` for OpenAI and Gemini (not additive) — summing them with `llm_input_tokens` double-counts.
19+
720
### Added
821
- Native `google-genai` SDK support covering `client.models.generate_content` + `generate_content_stream`, sync + async (`client.aio.models`).
922
- `extract_gemini_native` adapter maps `usage_metadata`: `prompt_token_count → input`, `candidates_token_count → output`, `cached_content_token_count → cache_read`, `thoughts_token_count → reasoning`, `prompt_tokens_details[modality=AUDIO/IMAGE] → audio_input/image_input`, `candidates_tokens_details[modality=AUDIO] → audio_output`, count of `candidates[0].content.parts[].function_call → tool_calls`.

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,9 @@ Backed by `contextvars` for safe propagation across `asyncio` tasks.
184184
- **OpenAI's `reasoning_tokens` is a SUBSET of `output`** — already counted in `completion_tokens`.
185185
- **Gemini's `thoughts_token_count` is ADDITIVE to `output`**`candidates + thoughts = total billable output`.
186186

187+
**Semantic note on input breakdowns (avoid double-counting):**
188+
For both OpenAI and Gemini, `cache_read`, `audio_input`, and `image_input` are **subsets of `input`**, not additive to it — they are a breakdown of tokens already counted in `llm_input_tokens`. For example, OpenAI reports `cached_tokens` under `prompt_tokens_details` *within* `prompt_tokens`, and Gemini's docs state `prompt_token_count` "includes the number of tokens in the cached content". A billable metric that sums `llm_input_tokens + llm_cached_input_tokens` (or `+ llm_audio_input_tokens`, `+ llm_image_input_tokens`) will **double-count**. Bill on `llm_input_tokens` as the total; use the breakdown fields only for cost attribution or discounted-rate tiers (e.g. cached input billed at a lower rate), subtracting them from `input` rather than adding.
189+
187190
OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.
188191

189192
## Error policy

RELEASING.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,23 @@ Configure the trusted publisher on PyPI:
2525

2626
Then in this repo: **Settings → Environments → New environment** named `pypi`. (No secrets needed inside it — OIDC handles auth.)
2727

28+
### Environment protection (required before first release)
29+
30+
Trusted publishing is bound to the `pypi` environment, so that environment is the **only** thing standing between a pushed tag and a live PyPI release. A freshly created environment has **no** protection rules by default — until you add them, any successful run publishes immediately. Treat this as a mandatory setup step, not an optional one. Configure it under **Settings → Environments → pypi**:
31+
32+
| Rule | Setting | Why |
33+
| --- | --- | --- |
34+
| Required reviewers | Add 1+ maintainers | The publish job pauses for human approval before it can mint the OIDC token and upload — a second pair of eyes on every release. |
35+
| Deployment branches and tags | **Selected** → add a `v*.*.*` tag rule | Only protected version tags can deploy to `pypi`; a random branch push or arbitrary tag can't trigger a publish. |
36+
37+
With these in place, the `test` and `build` jobs still run on any matching tag, but the `publish` job blocks until an approver signs off, and only for `v*.*.*` tags.
38+
39+
The workflow itself is hardened in depth, so a misconfigured environment alone can't publish from the wrong place:
40+
- Least-privilege `permissions: contents: read` default — only `publish` gets `id-token: write`, only `release` gets `contents: write`.
41+
- Every third-party action pinned to a full commit SHA so a re-pointed tag can't inject code into the token-minting job (kept fresh by `.github/dependabot.yml`).
42+
- The `publish` job carries `if: startsWith(github.ref, 'refs/tags/v')`, so even without the environment rule it refuses to run on a non-tag ref.
43+
- `publish` consumes the exact artifact built and version-checked in the `build` job (it never rebuilds), so the bytes on PyPI match what was tested.
44+
2845
## Cutting a release
2946

3047
```bash

src/lago_agent_sdk/detector.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,16 @@ def detect_client_kind(client: Any) -> str:
3737
# Older mistralai versions or aliased imports
3838
if cls_name == "mistral" and "mistral" in module:
3939
return "mistral"
40-
if "google" in module and ("genai" in module or "generativeai" in module):
40+
# New unified google-genai SDK only (`google.genai.Client`). The legacy
41+
# google-generativeai SDK (`google.generativeai.GenerativeModel`) has a
42+
# different surface — no `.models` / `.aio` — that the gemini wrapper cannot
43+
# instrument, so it would silently wrap nothing. Flag it separately so wrap()
44+
# can reject it with a clear migration message instead.
45+
#
46+
# "genai" is not a substring of "generativeai", so these never overlap.
47+
if "google" in module and "genai" in module:
4148
return "gemini"
49+
if "google" in module and "generativeai" in module:
50+
return "gemini_legacy"
4251

4352
return "unknown"

src/lago_agent_sdk/sdk.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,15 @@ def wrap(
9595
from .wrappers.gemini import wrap_gemini_client
9696

9797
return wrap_gemini_client(self, client, dimensions=dimensions, subscription=subscription)
98+
if kind == "gemini_legacy":
99+
raise UnknownClientError(
100+
"The legacy google-generativeai SDK "
101+
"(`import google.generativeai; genai.GenerativeModel(...)`) is not "
102+
"supported — its surface differs from the unified SDK and cannot be "
103+
"instrumented. Migrate to google-genai: `pip install google-genai`, "
104+
"then `from google import genai; client = genai.Client(...)` and wrap "
105+
"the Client. See https://ai.google.dev/gemini-api/docs/migrate."
106+
)
98107
if kind == "unknown":
99108
raise UnknownClientError(
100109
f"Unknown client passed to wrap(): {type(client).__module__}.{type(client).__name__}. "

src/lago_agent_sdk/wrappers/anthropic.py

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,35 @@ def _is_message_like(obj: Any) -> bool:
4646
return False
4747

4848

49+
def _merge_stream_usage(accumulated: dict[str, Any], payload: Any) -> None:
50+
"""Fold one streaming event's usage into the running accumulator.
51+
52+
Anthropic splits authoritative usage across two events:
53+
- ``message_start`` carries the input/cache counts nested under
54+
``message.usage`` (with ``output_tokens`` only primed to 1).
55+
- ``message_delta`` carries the *cumulative* ``output_tokens`` at the top
56+
level (and, in some API shapes, echoes input/cache there too).
57+
58+
Reading only the top-level usage misses ``message_start``'s input/cache, so
59+
a basic stream — whose ``message_delta`` is just ``{"output_tokens": N}`` —
60+
would bill ``input_tokens=0``. Merge both locations; ``dict.update`` lets the
61+
more complete / more recent values win while preserving the input counts from
62+
``message_start`` when a delta omits them.
63+
"""
64+
if not isinstance(payload, dict):
65+
return
66+
# message_start: input/cache live under message.usage
67+
message = payload.get("message")
68+
if isinstance(message, dict):
69+
nested = message.get("usage")
70+
if isinstance(nested, dict):
71+
accumulated.update(nested)
72+
# message_delta (and others): cumulative usage at the top level
73+
top = payload.get("usage")
74+
if isinstance(top, dict):
75+
accumulated.update(top)
76+
77+
4978
def wrap_anthropic_client(
5079
sdk: Any,
5180
client: Any,
@@ -95,20 +124,18 @@ def _create(*args: Any, **kwargs: Any) -> Any:
95124
_emit_from(response, model_id, sub, dims)
96125
return response
97126

98-
# Streaming — wrap the iterator to capture the final usage on close.
127+
# Streaming — wrap the iterator, merging usage across message_start
128+
# (input/cache) and message_delta (cumulative output) before emitting.
99129
def _wrap_stream(src: Iterator[Any]) -> Iterator[Any]:
100-
last_usage: dict[str, Any] | None = None
130+
accumulated: dict[str, Any] = {}
101131
try:
102132
for event in src:
103133
payload = event.model_dump() if hasattr(event, "model_dump") else event
104-
if isinstance(payload, dict):
105-
usage = payload.get("usage")
106-
if isinstance(usage, dict):
107-
last_usage = {"usage": usage}
134+
_merge_stream_usage(accumulated, payload)
108135
yield event
109136
finally:
110-
if last_usage is not None:
111-
_emit_from(last_usage, model_id, sub, dims)
137+
if accumulated:
138+
_emit_from({"usage": accumulated}, model_id, sub, dims)
112139

113140
return _wrap_stream(response)
114141

@@ -127,18 +154,15 @@ async def _create_async(*args: Any, **kwargs: Any) -> Any:
127154
return response
128155

129156
async def _wrap_async_stream(src: AsyncIterator[Any]) -> AsyncIterator[Any]:
130-
last_usage: dict[str, Any] | None = None
157+
accumulated: dict[str, Any] = {}
131158
try:
132159
async for event in src:
133160
payload = event.model_dump() if hasattr(event, "model_dump") else event
134-
if isinstance(payload, dict):
135-
usage = payload.get("usage")
136-
if isinstance(usage, dict):
137-
last_usage = {"usage": usage}
161+
_merge_stream_usage(accumulated, payload)
138162
yield event
139163
finally:
140-
if last_usage is not None:
141-
_emit_from(last_usage, model_id, sub, dims)
164+
if accumulated:
165+
_emit_from({"usage": accumulated}, model_id, sub, dims)
142166

143167
return _wrap_async_stream(response)
144168

0 commit comments

Comments
 (0)