feat(logs): pro-rate billing for rows removed by drop rules#62580
feat(logs): pro-rate billing for rows removed by drop rules#62580DanielVisca wants to merge 4 commits into
Conversation
|
3c304db to
6cab882
Compare
|
Thanks @greptile-apps — both addressed:
|
|
Both look good to me. For P1 specifically — documenting the reconciliation formula in the For P2 — keeping the divergence cases as separate Nothing blocking from my side on these two points. The remaining open question is the billing/pricing visibility sign-off mentioned in the PR description — that's a people/process gate, not a code issue. Tip: You can customize Greptile's behavior for this repo with |
|
Reviews (2): Last reviewed commit: "chore(logs): address review on billing p..." | Re-trigger Greptile |
|
Reviews (3): Last reviewed commit: "chore(logs): address review on billing p..." | Re-trigger Greptile |
Logs are billed on the per-message uncompressed header (whole-batch wire size), but drop rules and rate limits remove rows after that header is set, so dropped volume was still billed. Pro-rate the billed bytes by the dropped content fraction so bytes_ingested reflects only what survived. - sampling: processBuffer returns bytesTotal (sum of all rows' per-row bytes) alongside bytesDropped. - ingest: after drop rules run, reduce the team's bytesAllowed by header × bytesDropped/bytesTotal and recordsAllowed by recordsDropped, before usage metrics emit. bytesReceived stays gross (what was sent); bytes_ingested / records_ingested reflect what was kept. Approximation by design (Option 1): the shared envelope is spread across rows by content weight instead of kept as a fixed residual, so it only ever credits (never over-charges) and is within ~1% for typical multi-row batches; error grows only for small, resource-heavy, heavily-dropped batches. Exact per-row sizing is a follow-up. Generated-By: PostHog Code Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337
Surface the billing effect of drop rules and a signal for how trustworthy the
pro-rate is, scraped into VictoriaMetrics like the other logs_ingestion_* metrics.
Tier 1 (impact): logs_ingestion_billing_bytes_credited_total{team_id} and
logs_ingestion_billing_records_credited_total{team_id} — how much billing was
reduced by drops.
Tier 2 (accuracy confidence, no team_id label): compute a second, record-weighted
credit estimate and emit logs_ingestion_billing_prorate_divergence
(|content − record credit| / header) — high divergence flags a size-skewed batch
where the content-weighted pro-rate is least accurate — plus logs_ingestion_drop_batch_rows
and logs_ingestion_drop_fraction histograms (small batches / extreme fractions carry
larger error). We can't compute true prorate-vs-exact error at ingestion (no original
wire bytes there); these bound/flag it cheaply.
Generated-By: PostHog Code
Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337
- parameterize billingByteReductionForDrops tests with it.each - document that *_allowed_total counters are gross of the drop-rule billing credit (billed net = allowed - *_credited_total) Generated-By: PostHog Code Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337
The pro-rate weights previously used the per-row bytes_uncompressed field, which includes per-row denormalization overhead (resource attributes duplicated onto every row, server-generated uuid, id placeholders). That overhead is near-constant per row, so as a ratio weight it pulled the credit toward record-count weighting instead of share-of-content — under-crediting drops of large rows and over-crediting drops of small ones. Compute content-only weights (body + attributes + event_name) inline at drop time instead. The per-row field keeps its existing roles (rate-limit cost, bytes_dropped_by_rule) unchanged. Generated-By: PostHog Code Task-Id: 7b5b6dce-16dd-45be-8895-e7c933f0a92a
4862afb to
04112e9
Compare
| * "share of what the customer sent". | ||
| */ | ||
| const contentBytes = (r: LogRecord): number => { | ||
| let total = Buffer.byteLength(r.body ?? '') + Buffer.byteLength(r.event_name ?? '') |
There was a problem hiding this comment.
Medium: Drop-rule billing credit can include kept payload bytes
contentBytes omits resource_attributes, service_name, severity_text, instrumentation_scope, and other customer-controlled fields even though the credit is later applied against the whole-message bytes_uncompressed header. A sender for a project with drop rules can batch a dropped row whose body dominates this reduced denominator with kept rows that carry large resource attributes, causing billingByteReductionForDrops to credit most of the Kafka message while the kept rows are still produced and ingested. Base the numerator and denominator on the same byte model as the billed header, or at least include all customer-controlled fields that contribute to the ingested row size.
PR overviewThis PR adds pro-rated log billing behavior for rows removed by drop rules during logs ingestion. The touched sampling logic calculates byte reductions for dropped log rows so billing can account for data excluded by those rules. There is one open issue in the billing reduction calculation: the byte denominator used for dropped rows excludes several customer-controlled fields while the credit is applied against the full uncompressed message size. A sender in a project using drop rules could craft batches that over-credit dropped rows and reduce billing for kept log data that is still ingested. No issues have been fixed yet, so the PR still carries a concrete billing-integrity risk. Open issues (1)
Fixed/addressed: 0 · PR risk: 5/10 |
Problem
Logs/traces are billed on the per-message uncompressed header (
bytes_uncompressed= the whole-batch wire size, set at capture). Drop rules and rate limits remove rows after that header is set, in ingestion — so dropped volume is still billed. A customer's drop rule reduces storage/query load but not their bill.The two byte numbers we have aren't directly subtractable: the header is the whole-message wire size; the per-row
bytes_uncompressedfield is the row's denormalized content — it carries near-constant per-row overhead (resource attributes duplicated onto every row, server-generated uuid, trace/span-id placeholders). Used as a ratio weight, that overhead skews the pro-rate toward record-count weighting: dropping a few large rows under-credits, dropping many small rows over-credits.Changes
bytes_uncompressed × (1 − dropped_content / total_content)per message. Batches no rule touches keep the fast path (never unwrapped, billed exactly as today). The billing basis stays the payload header — matching how Datadog/Axiom/Sentry/Grafana bill (uncompressed customer-sent bytes); drop rules just credit back the removed share.body + attributes + event_nameper row, in the sampling service where the records are already decoded) — not the per-rowbytes_uncompressedfield, for the skew reason above. The per-row field keeps its existing roles unchanged (rate-limit cost,bytes_dropped_by_rule).bytes_receivedstays gross (what was sent);bytes_ingestedandrecords_ingestedreflect what survived drops.How did you test this code?
Agent-assisted rework. Automated tests:
logs-billing.test.ts(pure pro-rate function: fraction, caps, unmeasurable cases, rounding), sampling service tests extended for content-weight accounting (including a row with attributes + event_name, and independence from the per-rowbytes_uncompressedfield), consumer suite — 74 tests pass across the three suites. No manual testing.Automatic notifications
Docs update
🤖 Agent context
Autonomy: Human-driven (agent-assisted)
Reworked with PostHog Code (Claude) after the records-vs-payload header comparison (#63116/#63115) showed per-row byte sums consistently exceed the payload due to per-row denormalization overhead. Team decision: keep billing on customer-sent payload bytes (industry norm) and fix drop-rule accounting by pro-rating with content-only weights computed at unwrap time. Rebased onto master; the weight change is the only semantic difference from the original draft.
Created with PostHog Code