You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bugs
- Anthropic messages.create(stream=True) under-billed input tokens. The
stream wrapper read only the top-level `usage`, which on a basic stream
appears only on message_delta as {output_tokens: N}. The authoritative
input/cache counts arrive nested under message.usage on message_start and
were dropped, so input billed 0. New _merge_stream_usage folds both
locations (message_start input/cache + message_delta cumulative output)
across sync and async paths. Fixtures now use the realistic wire shape
(message_delta carries no input echo), so the stream tests are genuine
regressions.
- Legacy google-generativeai SDK silently emitted nothing. The detector
matched both google-genai and the deprecated google-generativeai, but the
wrapper only instruments the unified Client.models/.aio surface, so a
legacy GenerativeModel wrapped nothing. Detector now returns a distinct
'gemini_legacy' kind and wrap() rejects it with a migrate-to-google-genai
message. ("genai" is not a substring of "generativeai", so no overlap.)
Docs
- README: cache_read / audio_input / image_input are subsets of input for
OpenAI and Gemini, not additive — summing them double-counts.
Publish-workflow hardening
- Least-privilege default (permissions: contents: read); only publish gets
id-token: write, only release gets contents: write.
- All third-party actions pinned to full commit SHAs (version in comment).
- Added `if: startsWith(github.ref, 'refs/tags/v')` to the publish job as
defense-in-depth.
- Added .github/dependabot.yml (github-actions) to keep SHA pins fresh.
- RELEASING.md documents pypi environment protection (required reviewers +
protected-tag restriction) as a REQUIRED setup step.
Gate: ruff + format + mypy clean; 319 unit tests pass; coverage 89.27%.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,19 @@ All notable changes to this project will be documented here. Format follows [Kee
4
4
5
5
## [Unreleased]
6
6
7
+
### Fixed
8
+
-**Anthropic `messages.create(stream=True)` under-billed input tokens.** The stream wrapper read only top-level `usage`, which on a basic stream appears only on `message_delta` as `{output_tokens: N}` — the authoritative `input_tokens` / `cache_*` counts arrive nested under `message.usage` on the `message_start` event and were ignored, so input billed 0. The wrapper now merges usage from `message_start` (input/cache) and `message_delta` (cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo).
9
+
-**Legacy `google-generativeai` SDK silently emitted no events.** The detector matched both the new `google-genai` and the deprecated `google-generativeai` SDKs, but the wrapper only instruments the unified `Client.models` / `.aio` surface — a legacy `GenerativeModel` routed through and wrapped nothing. `wrap()` now rejects legacy clients with a clear pointer to migrate to `google-genai`.
10
+
11
+
### Security
12
+
- Hardened the publish workflow: least-privilege `permissions: contents: read` default (only `publish` gets `id-token: write`, only `release` gets `contents: write`), and every third-party action pinned to a full commit SHA so a re-pointed tag can't inject code into the OIDC-token-minting job.
13
+
- Added `if: startsWith(github.ref, 'refs/tags/v')` to the `publish` job as defense-in-depth — it refuses to run on a non-tag ref even if the environment's protected-tag rule is misconfigured.
14
+
- Added `.github/dependabot.yml` (github-actions ecosystem) so the SHA pins stay fresh — Dependabot bumps the SHA and version comment together rather than letting actions silently age.
15
+
- RELEASING.md now documents `pypi` environment protection (required reviewers + protected-tag restriction) as a **required** setup step, not optional, since trusted publishing is only as strong as that environment's rules.
16
+
17
+
### Documentation
18
+
- README: clarified that `cache_read`, `audio_input`, and `image_input` are **subsets** of `input` for OpenAI and Gemini (not additive) — summing them with `llm_input_tokens` double-counts.
Copy file name to clipboardExpand all lines: README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -184,6 +184,9 @@ Backed by `contextvars` for safe propagation across `asyncio` tasks.
184
184
-**OpenAI's `reasoning_tokens` is a SUBSET of `output`** — already counted in `completion_tokens`.
185
185
-**Gemini's `thoughts_token_count` is ADDITIVE to `output`** — `candidates + thoughts = total billable output`.
186
186
187
+
**Semantic note on input breakdowns (avoid double-counting):**
188
+
For both OpenAI and Gemini, `cache_read`, `audio_input`, and `image_input` are **subsets of `input`**, not additive to it — they are a breakdown of tokens already counted in `llm_input_tokens`. For example, OpenAI reports `cached_tokens` under `prompt_tokens_details`*within*`prompt_tokens`, and Gemini's docs state `prompt_token_count` "includes the number of tokens in the cached content". A billable metric that sums `llm_input_tokens + llm_cached_input_tokens` (or `+ llm_audio_input_tokens`, `+ llm_image_input_tokens`) will **double-count**. Bill on `llm_input_tokens` as the total; use the breakdown fields only for cost attribution or discounted-rate tiers (e.g. cached input billed at a lower rate), subtracting them from `input` rather than adding.
189
+
187
190
OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.
Copy file name to clipboardExpand all lines: RELEASING.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,23 @@ Configure the trusted publisher on PyPI:
25
25
26
26
Then in this repo: **Settings → Environments → New environment** named `pypi`. (No secrets needed inside it — OIDC handles auth.)
27
27
28
+
### Environment protection (required before first release)
29
+
30
+
Trusted publishing is bound to the `pypi` environment, so that environment is the **only** thing standing between a pushed tag and a live PyPI release. A freshly created environment has **no** protection rules by default — until you add them, any successful run publishes immediately. Treat this as a mandatory setup step, not an optional one. Configure it under **Settings → Environments → pypi**:
31
+
32
+
| Rule | Setting | Why |
33
+
| --- | --- | --- |
34
+
| Required reviewers | Add 1+ maintainers | The publish job pauses for human approval before it can mint the OIDC token and upload — a second pair of eyes on every release. |
35
+
| Deployment branches and tags |**Selected** → add a `v*.*.*` tag rule | Only protected version tags can deploy to `pypi`; a random branch push or arbitrary tag can't trigger a publish. |
36
+
37
+
With these in place, the `test` and `build` jobs still run on any matching tag, but the `publish` job blocks until an approver signs off, and only for `v*.*.*` tags.
38
+
39
+
The workflow itself is hardened in depth, so a misconfigured environment alone can't publish from the wrong place:
40
+
- Least-privilege `permissions: contents: read` default — only `publish` gets `id-token: write`, only `release` gets `contents: write`.
41
+
- Every third-party action pinned to a full commit SHA so a re-pointed tag can't inject code into the token-minting job (kept fresh by `.github/dependabot.yml`).
42
+
- The `publish` job carries `if: startsWith(github.ref, 'refs/tags/v')`, so even without the environment rule it refuses to run on a non-tag ref.
43
+
-`publish` consumes the exact artifact built and version-checked in the `build` job (it never rebuilds), so the bytes on PyPI match what was tested.
0 commit comments