Skip to content

feat(logs): pro-rate billing for rows removed by drop rules#62580

Open
DanielVisca wants to merge 4 commits into
masterfrom
posthog-code/logs-billing-prorate-dropped-bytes
Open

feat(logs): pro-rate billing for rows removed by drop rules#62580
DanielVisca wants to merge 4 commits into
masterfrom
posthog-code/logs-billing-prorate-dropped-bytes

Conversation

@DanielVisca

@DanielVisca DanielVisca commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Problem

Logs/traces are billed on the per-message uncompressed header (bytes_uncompressed = the whole-batch wire size, set at capture). Drop rules and rate limits remove rows after that header is set, in ingestion — so dropped volume is still billed. A customer's drop rule reduces storage/query load but not their bill.

The two byte numbers we have aren't directly subtractable: the header is the whole-message wire size; the per-row bytes_uncompressed field is the row's denormalized content — it carries near-constant per-row overhead (resource attributes duplicated onto every row, server-generated uuid, trace/span-id placeholders). Used as a ratio weight, that overhead skews the pro-rate toward record-count weighting: dropping a few large rows under-credits, dropping many small rows over-credits.

Changes

  • Pro-rate, don't subtract: bill bytes_uncompressed × (1 − dropped_content / total_content) per message. Batches no rule touches keep the fast path (never unwrapped, billed exactly as today). The billing basis stays the payload header — matching how Datadog/Axiom/Sentry/Grafana bill (uncompressed customer-sent bytes); drop rules just credit back the removed share.
  • Weights are customer-content bytes computed inline at drop time (body + attributes + event_name per row, in the sampling service where the records are already decoded) — not the per-row bytes_uncompressed field, for the skew reason above. The per-row field keeps its existing roles unchanged (rate-limit cost, bytes_dropped_by_rule).
  • Credit is capped at the header (a message can never bill above today's gross); unmeasurable cases (no header / no content bytes) credit nothing rather than something wrong.
  • Observability: per-team credited bytes/records counters, plus confidence histograms (content-vs-record-weighted divergence, batch size, drop fraction) so we can see where the pro-rate is least trustworthy.
  • bytes_received stays gross (what was sent); bytes_ingested and records_ingested reflect what survived drops.

How did you test this code?

Agent-assisted rework. Automated tests: logs-billing.test.ts (pure pro-rate function: fraction, caps, unmeasurable cases, rounding), sampling service tests extended for content-weight accounting (including a row with attributes + event_name, and independence from the per-row bytes_uncompressed field), consumer suite — 74 tests pass across the three suites. No manual testing.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Docs update

🤖 Agent context

Autonomy: Human-driven (agent-assisted)

Reworked with PostHog Code (Claude) after the records-vs-payload header comparison (#63116/#63115) showed per-row byte sums consistently exceed the payload due to per-row denormalization overhead. Team decision: keep billing on customer-sent payload bytes (industry norm) and fix drop-rule accounting by pro-rating with content-only weights computed at unwrap time. Rebased onto master; the weight change is the only semantic difference from the original draft.


Created with PostHog Code

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Comments Outside Diff (1)

  1. nodejs/src/logs-ingestion/logs-ingestion-consumer.ts, line 396-397 (link)

    P1 Prometheus counter drifts from billing after drop-rule credit

    logsBytesAllowedCounter and logsRecordsAllowedCounter are incremented here with the full gross bytes/record counts, but usageStats.bytesAllowed and usageStats.recordsAllowed are later reduced by the drop-rule credit in processAndProduceLogMessages. After this PR, logs_ingestion_bytes_allowed_total and logs_ingestion_records_allowed_total will exceed bytes_ingested / records_ingested in app_metrics2 by exactly the credited amount. Engineers debugging billing questions who compare these two sources will see a persistent unexplained gap. Either apply the same credit to the counters after sampling resolves (harder, since Prometheus counters can't be decremented), add a dedicated logs_ingestion_bytes_allowed_after_drops_total counter, or document the intentional divergence explicitly so alerting/dashboards are not misconfigured against it.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nodejs/src/logs-ingestion/logs-ingestion-consumer.ts
    Line: 396-397
    
    Comment:
    **Prometheus counter drifts from billing after drop-rule credit**
    
    `logsBytesAllowedCounter` and `logsRecordsAllowedCounter` are incremented here with the full gross bytes/record counts, but `usageStats.bytesAllowed` and `usageStats.recordsAllowed` are later reduced by the drop-rule credit in `processAndProduceLogMessages`. After this PR, `logs_ingestion_bytes_allowed_total` and `logs_ingestion_records_allowed_total` will exceed `bytes_ingested` / `records_ingested` in app_metrics2 by exactly the credited amount. Engineers debugging billing questions who compare these two sources will see a persistent unexplained gap. Either apply the same credit to the counters after sampling resolves (harder, since Prometheus counters can't be decremented), add a dedicated `logs_ingestion_bytes_allowed_after_drops_total` counter, or document the intentional divergence explicitly so alerting/dashboards are not misconfigured against it.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
nodejs/src/logs-ingestion/logs-billing.test.ts:3-30
These six test cases all exercise the same function with different inputs and expected outputs. Consolidating them into a single `it.each` table makes it much easier to add cases and keeps the pattern consistent with the codebase's preference for parameterised tests.

```suggestion
describe('billingByteReductionForDrops', () => {
    it.each([
        // label, headerBytes, bytesDropped, bytesTotal, expected
        ['credits the dropped content fraction of the header', 1000, 600, 1000, 600],
        ['credits nothing when no rows were dropped', 1000, 0, 1000, 0],
        ['credits the full header when every row is dropped', 1000, 1000, 1000, 1000],
        ['credits nothing when bytesTotal is zero (unmeasurable)', 1000, 50, 0, 0],
        ['credits nothing when headerBytes is zero (unmeasurable)', 0, 50, 100, 0],
        ['caps the dropped fraction at 1 (defensive)', 1000, 1500, 1000, 1000],
        ['rounds to whole bytes (1000 × 1/3 = 333.33 → 333)', 1000, 1, 3, 333],
    ])('%s', (_label, headerBytes, bytesDropped, bytesTotal, expected) => {
        expect(billingByteReductionForDrops(headerBytes, bytesDropped, bytesTotal)).toBe(expected)
    })
})
```

### Issue 2 of 2
nodejs/src/logs-ingestion/logs-ingestion-consumer.ts:396-397
**Prometheus counter drifts from billing after drop-rule credit**

`logsBytesAllowedCounter` and `logsRecordsAllowedCounter` are incremented here with the full gross bytes/record counts, but `usageStats.bytesAllowed` and `usageStats.recordsAllowed` are later reduced by the drop-rule credit in `processAndProduceLogMessages`. After this PR, `logs_ingestion_bytes_allowed_total` and `logs_ingestion_records_allowed_total` will exceed `bytes_ingested` / `records_ingested` in app_metrics2 by exactly the credited amount. Engineers debugging billing questions who compare these two sources will see a persistent unexplained gap. Either apply the same credit to the counters after sampling resolves (harder, since Prometheus counters can't be decremented), add a dedicated `logs_ingestion_bytes_allowed_after_drops_total` counter, or document the intentional divergence explicitly so alerting/dashboards are not misconfigured against it.

Reviews (1): Last reviewed commit: "feat(logs): pro-rate billing for rows re..." | Re-trigger Greptile

Comment thread nodejs/src/logs-ingestion/logs-billing.test.ts
@DanielVisca DanielVisca force-pushed the posthog-code/logs-billing-prorate-dropped-bytes branch from 3c304db to 6cab882 Compare June 9, 2026 23:42
@DanielVisca

Copy link
Copy Markdown
Contributor Author

Thanks @greptile-apps — both addressed:

  • P2 (parameterize tests): billingByteReductionForDrops cases are now a single it.each table; the two content-vs-record divergence cases stay as separate its since they assert a relationship rather than a fixed input→output.
  • P1 (counter vs billing divergence): intentional, and now documented rather than changed. logs_ingestion_bytes_allowed_total / records_allowed_total stay gross; the billed net is recoverable as allowed − logs_ingestion_billing_*_credited_total (we already emit the credited counter). Added that reconciliation to both counters' help text and a one-line note at the increment site, so anyone comparing the counter to bytes_ingested in app_metrics2 sees why they differ. No decrement / extra counter needed.

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Both look good to me.

For P1 specifically — documenting the reconciliation formula in the help text (allowed − credited = net billed) is the right call. It's the standard pattern for Prometheus counters (you don't subtract from a counter; you emit the delta and let the consumer do the arithmetic), and making it explicit in the help text means anyone writing a PromQL query or building a dashboard won't have to reverse-engineer it.

For P2 — keeping the divergence cases as separate its rather than folding them into the it.each table is the right instinct. Those tests express a property ("these two diverge / agree"), not a fixed expected value, so they'd be awkward in a data-driven table anyway.

Nothing blocking from my side on these two points. The remaining open question is the billing/pricing visibility sign-off mentioned in the PR description — that's a people/process gate, not a code issue.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@DanielVisca DanielVisca marked this pull request as ready for review June 10, 2026 00:37
@DanielVisca DanielVisca marked this pull request as draft June 10, 2026 00:40
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "chore(logs): address review on billing p..." | Re-trigger Greptile

@DanielVisca DanielVisca marked this pull request as ready for review June 10, 2026 01:04
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Reviews (3): Last reviewed commit: "chore(logs): address review on billing p..." | Re-trigger Greptile

Logs are billed on the per-message uncompressed header (whole-batch wire size),
but drop rules and rate limits remove rows after that header is set, so dropped
volume was still billed. Pro-rate the billed bytes by the dropped content fraction
so bytes_ingested reflects only what survived.

- sampling: processBuffer returns bytesTotal (sum of all rows' per-row bytes)
  alongside bytesDropped.
- ingest: after drop rules run, reduce the team's bytesAllowed by
  header × bytesDropped/bytesTotal and recordsAllowed by recordsDropped, before
  usage metrics emit. bytesReceived stays gross (what was sent); bytes_ingested /
  records_ingested reflect what was kept.

Approximation by design (Option 1): the shared envelope is spread across rows by
content weight instead of kept as a fixed residual, so it only ever credits (never
over-charges) and is within ~1% for typical multi-row batches; error grows only for
small, resource-heavy, heavily-dropped batches. Exact per-row sizing is a follow-up.

Generated-By: PostHog Code
Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337
Surface the billing effect of drop rules and a signal for how trustworthy the
pro-rate is, scraped into VictoriaMetrics like the other logs_ingestion_* metrics.

Tier 1 (impact): logs_ingestion_billing_bytes_credited_total{team_id} and
logs_ingestion_billing_records_credited_total{team_id} — how much billing was
reduced by drops.

Tier 2 (accuracy confidence, no team_id label): compute a second, record-weighted
credit estimate and emit logs_ingestion_billing_prorate_divergence
(|content − record credit| / header) — high divergence flags a size-skewed batch
where the content-weighted pro-rate is least accurate — plus logs_ingestion_drop_batch_rows
and logs_ingestion_drop_fraction histograms (small batches / extreme fractions carry
larger error). We can't compute true prorate-vs-exact error at ingestion (no original
wire bytes there); these bound/flag it cheaply.

Generated-By: PostHog Code
Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337
- parameterize billingByteReductionForDrops tests with it.each
- document that *_allowed_total counters are gross of the drop-rule
  billing credit (billed net = allowed - *_credited_total)

Generated-By: PostHog Code
Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337
The pro-rate weights previously used the per-row bytes_uncompressed
field, which includes per-row denormalization overhead (resource
attributes duplicated onto every row, server-generated uuid, id
placeholders). That overhead is near-constant per row, so as a ratio
weight it pulled the credit toward record-count weighting instead of
share-of-content — under-crediting drops of large rows and
over-crediting drops of small ones.

Compute content-only weights (body + attributes + event_name) inline at
drop time instead. The per-row field keeps its existing roles
(rate-limit cost, bytes_dropped_by_rule) unchanged.

Generated-By: PostHog Code
Task-Id: 7b5b6dce-16dd-45be-8895-e7c933f0a92a
@DanielVisca DanielVisca force-pushed the posthog-code/logs-billing-prorate-dropped-bytes branch from 4862afb to 04112e9 Compare June 12, 2026 00:53
@DanielVisca DanielVisca requested a review from a team June 12, 2026 00:58
@DanielVisca DanielVisca added the stamphog Request AI review from stamphog label Jun 12, 2026

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR directly modifies billing accounting — it adjusts bytesAllowed and recordsAllowed in the usageStats billing row based on a pro-rata credit calculation for drop-rule removals. Changes to billing logic require human review and cannot be auto-approved.

@stamphog stamphog Bot removed the stamphog Request AI review from stamphog label Jun 12, 2026
* "share of what the customer sent".
*/
const contentBytes = (r: LogRecord): number => {
let total = Buffer.byteLength(r.body ?? '') + Buffer.byteLength(r.event_name ?? '')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Drop-rule billing credit can include kept payload bytes

contentBytes omits resource_attributes, service_name, severity_text, instrumentation_scope, and other customer-controlled fields even though the credit is later applied against the whole-message bytes_uncompressed header. A sender for a project with drop rules can batch a dropped row whose body dominates this reduced denominator with kept rows that carry large resource attributes, causing billingByteReductionForDrops to credit most of the Kafka message while the kept rows are still produced and ingested. Base the numerator and denominator on the same byte model as the billed header, or at least include all customer-controlled fields that contribute to the ingested row size.

@veria-ai

veria-ai Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR overview

This PR adds pro-rated log billing behavior for rows removed by drop rules during logs ingestion. The touched sampling logic calculates byte reductions for dropped log rows so billing can account for data excluded by those rules.

There is one open issue in the billing reduction calculation: the byte denominator used for dropped rows excludes several customer-controlled fields while the credit is applied against the full uncompressed message size. A sender in a project using drop rules could craft batches that over-credit dropped rows and reduce billing for kept log data that is still ingested. No issues have been fixed yet, so the PR still carries a concrete billing-integrity risk.

Open issues (1)

Fixed/addressed: 0 · PR risk: 5/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant