feat(logs): pro-rate billing for rows removed by drop rules by DanielVisca · Pull Request #62580 · PostHog/posthog

DanielVisca · 2026-06-09T23:27:31Z

Problem

Logs/traces are billed on the per-message uncompressed header (bytes_uncompressed = the whole-batch wire size, set at capture). Drop rules and rate limits remove rows after that header is set, in ingestion — so dropped volume is still billed. A customer's drop rule reduces storage/query load but not their bill.

The two byte numbers we have aren't directly subtractable: the header is the whole-message wire size; the per-row bytes_uncompressed field is the row's denormalized content — it carries near-constant per-row overhead (resource attributes duplicated onto every row, server-generated uuid, trace/span-id placeholders). Used as a ratio weight, that overhead skews the pro-rate toward record-count weighting: dropping a few large rows under-credits, dropping many small rows over-credits.

Changes

Pro-rate, don't subtract: bill bytes_uncompressed × (1 − dropped_content / total_content) per message. Batches no rule touches keep the fast path (never unwrapped, billed exactly as today). The billing basis stays the payload header — matching how Datadog/Axiom/Sentry/Grafana bill (uncompressed customer-sent bytes); drop rules just credit back the removed share.
Weights are customer-content bytes computed inline at drop time (body + attributes + event_name per row, in the sampling service where the records are already decoded) — not the per-row bytes_uncompressed field, for the skew reason above. The per-row field keeps its existing roles unchanged (rate-limit cost, bytes_dropped_by_rule).
Credit is capped at the header (a message can never bill above today's gross); unmeasurable cases (no header / no content bytes) credit nothing rather than something wrong.
Observability: per-team credited bytes/records counters, plus confidence histograms (content-vs-record-weighted divergence, batch size, drop fraction) so we can see where the pro-rate is least trustworthy.
bytes_received stays gross (what was sent); bytes_ingested and records_ingested reflect what survived drops.

How did you test this code?

Agent-assisted rework. Automated tests: logs-billing.test.ts (pure pro-rate function: fraction, caps, unmeasurable cases, rounding), sampling service tests extended for content-weight accounting (including a row with attributes + event_name, and independence from the per-row bytes_uncompressed field), consumer suite — 74 tests pass across the three suites. No manual testing.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

Docs update

🤖 Agent context

Autonomy: Human-driven (agent-assisted)

Reworked with PostHog Code (Claude) after the records-vs-payload header comparison (#63116/#63115) showed per-row byte sums consistently exceed the payload due to per-row denormalization overhead. Team decision: keep billing on customer-sent payload bytes (industry norm) and fix drop-rule accounting by pro-rating with content-only weights computed at unwrap time. Rebased onto master; the weight change is the only semantic difference from the original draft.

Created with PostHog Code

greptile-apps · 2026-06-09T23:30:51Z

Comments Outside Diff (1)

nodejs/src/logs-ingestion/logs-ingestion-consumer.ts, line 396-397 (link)

Prometheus counter drifts from billing after drop-rule credit

logsBytesAllowedCounter and logsRecordsAllowedCounter are incremented here with the full gross bytes/record counts, but usageStats.bytesAllowed and usageStats.recordsAllowed are later reduced by the drop-rule credit in processAndProduceLogMessages. After this PR, logs_ingestion_bytes_allowed_total and logs_ingestion_records_allowed_total will exceed bytes_ingested / records_ingested in app_metrics2 by exactly the credited amount. Engineers debugging billing questions who compare these two sources will see a persistent unexplained gap. Either apply the same credit to the counters after sampling resolves (harder, since Prometheus counters can't be decremented), add a dedicated logs_ingestion_bytes_allowed_after_drops_total counter, or document the intentional divergence explicitly so alerting/dashboards are not misconfigured against it.

Prompt To Fix With AI

This is a comment left during a code review.
Path: nodejs/src/logs-ingestion/logs-ingestion-consumer.ts
Line: 396-397

Comment:
**Prometheus counter drifts from billing after drop-rule credit**

`logsBytesAllowedCounter` and `logsRecordsAllowedCounter` are incremented here with the full gross bytes/record counts, but `usageStats.bytesAllowed` and `usageStats.recordsAllowed` are later reduced by the drop-rule credit in `processAndProduceLogMessages`. After this PR, `logs_ingestion_bytes_allowed_total` and `logs_ingestion_records_allowed_total` will exceed `bytes_ingested` / `records_ingested` in app_metrics2 by exactly the credited amount. Engineers debugging billing questions who compare these two sources will see a persistent unexplained gap. Either apply the same credit to the counters after sampling resolves (harder, since Prometheus counters can't be decremented), add a dedicated `logs_ingestion_bytes_allowed_after_drops_total` counter, or document the intentional divergence explicitly so alerting/dashboards are not misconfigured against it.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
nodejs/src/logs-ingestion/logs-billing.test.ts:3-30
These six test cases all exercise the same function with different inputs and expected outputs. Consolidating them into a single `it.each` table makes it much easier to add cases and keeps the pattern consistent with the codebase's preference for parameterised tests.

```suggestion
describe('billingByteReductionForDrops', () => {
    it.each([
        // label, headerBytes, bytesDropped, bytesTotal, expected
        ['credits the dropped content fraction of the header', 1000, 600, 1000, 600],
        ['credits nothing when no rows were dropped', 1000, 0, 1000, 0],
        ['credits the full header when every row is dropped', 1000, 1000, 1000, 1000],
        ['credits nothing when bytesTotal is zero (unmeasurable)', 1000, 50, 0, 0],
        ['credits nothing when headerBytes is zero (unmeasurable)', 0, 50, 100, 0],
        ['caps the dropped fraction at 1 (defensive)', 1000, 1500, 1000, 1000],
        ['rounds to whole bytes (1000 × 1/3 = 333.33 → 333)', 1000, 1, 3, 333],
    ])('%s', (_label, headerBytes, bytesDropped, bytesTotal, expected) => {
        expect(billingByteReductionForDrops(headerBytes, bytesDropped, bytesTotal)).toBe(expected)
    })
})
```

### Issue 2 of 2
nodejs/src/logs-ingestion/logs-ingestion-consumer.ts:396-397
**Prometheus counter drifts from billing after drop-rule credit**

`logsBytesAllowedCounter` and `logsRecordsAllowedCounter` are incremented here with the full gross bytes/record counts, but `usageStats.bytesAllowed` and `usageStats.recordsAllowed` are later reduced by the drop-rule credit in `processAndProduceLogMessages`. After this PR, `logs_ingestion_bytes_allowed_total` and `logs_ingestion_records_allowed_total` will exceed `bytes_ingested` / `records_ingested` in app_metrics2 by exactly the credited amount. Engineers debugging billing questions who compare these two sources will see a persistent unexplained gap. Either apply the same credit to the counters after sampling resolves (harder, since Prometheus counters can't be decremented), add a dedicated `logs_ingestion_bytes_allowed_after_drops_total` counter, or document the intentional divergence explicitly so alerting/dashboards are not misconfigured against it.

_{Reviews (1): Last reviewed commit: "feat(logs): pro-rate billing for rows re..." | Re-trigger Greptile}

DanielVisca · 2026-06-10T00:33:22Z

Thanks @greptile-apps — both addressed:

P2 (parameterize tests): billingByteReductionForDrops cases are now a single it.each table; the two content-vs-record divergence cases stay as separate its since they assert a relationship rather than a fixed input→output.
P1 (counter vs billing divergence): intentional, and now documented rather than changed. logs_ingestion_bytes_allowed_total / records_allowed_total stay gross; the billed net is recoverable as allowed − logs_ingestion_billing_*_credited_total (we already emit the credited counter). Added that reconciliation to both counters' help text and a one-line note at the increment site, so anyone comparing the counter to bytes_ingested in app_metrics2 sees why they differ. No decrement / extra counter needed.

greptile-apps · 2026-06-10T00:33:55Z

Both look good to me.

For P1 specifically — documenting the reconciliation formula in the help text (allowed − credited = net billed) is the right call. It's the standard pattern for Prometheus counters (you don't subtract from a counter; you emit the delta and let the consumer do the arithmetic), and making it explicit in the help text means anyone writing a PromQL query or building a dashboard won't have to reverse-engineer it.

For P2 — keeping the divergence cases as separate its rather than folding them into the it.each table is the right instinct. Those tests express a property ("these two diverge / agree"), not a fixed expected value, so they'd be awkward in a data-driven table anyway.

Nothing blocking from my side on these two points. The remaining open question is the billing/pricing visibility sign-off mentioned in the PR description — that's a people/process gate, not a code issue.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

greptile-apps · 2026-06-10T00:47:16Z

_{Reviews (2): Last reviewed commit: "chore(logs): address review on billing p..." | Re-trigger Greptile}

greptile-apps · 2026-06-10T01:09:08Z

_{Reviews (3): Last reviewed commit: "chore(logs): address review on billing p..." | Re-trigger Greptile}

Logs are billed on the per-message uncompressed header (whole-batch wire size), but drop rules and rate limits remove rows after that header is set, so dropped volume was still billed. Pro-rate the billed bytes by the dropped content fraction so bytes_ingested reflects only what survived. - sampling: processBuffer returns bytesTotal (sum of all rows' per-row bytes) alongside bytesDropped. - ingest: after drop rules run, reduce the team's bytesAllowed by header × bytesDropped/bytesTotal and recordsAllowed by recordsDropped, before usage metrics emit. bytesReceived stays gross (what was sent); bytes_ingested / records_ingested reflect what was kept. Approximation by design (Option 1): the shared envelope is spread across rows by content weight instead of kept as a fixed residual, so it only ever credits (never over-charges) and is within ~1% for typical multi-row batches; error grows only for small, resource-heavy, heavily-dropped batches. Exact per-row sizing is a follow-up. Generated-By: PostHog Code Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337

Surface the billing effect of drop rules and a signal for how trustworthy the pro-rate is, scraped into VictoriaMetrics like the other logs_ingestion_* metrics. Tier 1 (impact): logs_ingestion_billing_bytes_credited_total{team_id} and logs_ingestion_billing_records_credited_total{team_id} — how much billing was reduced by drops. Tier 2 (accuracy confidence, no team_id label): compute a second, record-weighted credit estimate and emit logs_ingestion_billing_prorate_divergence (|content − record credit| / header) — high divergence flags a size-skewed batch where the content-weighted pro-rate is least accurate — plus logs_ingestion_drop_batch_rows and logs_ingestion_drop_fraction histograms (small batches / extreme fractions carry larger error). We can't compute true prorate-vs-exact error at ingestion (no original wire bytes there); these bound/flag it cheaply. Generated-By: PostHog Code Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337

- parameterize billingByteReductionForDrops tests with it.each - document that *_allowed_total counters are gross of the drop-rule billing credit (billed net = allowed - *_credited_total) Generated-By: PostHog Code Task-Id: 50d7af24-7b5c-4322-a593-ea13562dc337

The pro-rate weights previously used the per-row bytes_uncompressed field, which includes per-row denormalization overhead (resource attributes duplicated onto every row, server-generated uuid, id placeholders). That overhead is near-constant per row, so as a ratio weight it pulled the credit toward record-count weighting instead of share-of-content — under-crediting drops of large rows and over-crediting drops of small ones. Compute content-only weights (body + attributes + event_name) inline at drop time instead. The per-row field keeps its existing roles (rate-limit cost, bytes_dropped_by_rule) unchanged. Generated-By: PostHog Code Task-Id: 7b5b6dce-16dd-45be-8895-e7c933f0a92a

stamphog

This PR directly modifies billing accounting — it adjusts bytesAllowed and recordsAllowed in the usageStats billing row based on a pro-rata credit calculation for drop-rule removals. Changes to billing logic require human review and cannot be auto-approved.

veria-ai · 2026-06-12T01:02:47Z

+ * "share of what the customer sent".
+ */
+const contentBytes = (r: LogRecord): number => {
+    let total = Buffer.byteLength(r.body ?? '') + Buffer.byteLength(r.event_name ?? '')


Medium: Drop-rule billing credit can include kept payload bytes

contentBytes omits resource_attributes, service_name, severity_text, instrumentation_scope, and other customer-controlled fields even though the credit is later applied against the whole-message bytes_uncompressed header. A sender for a project with drop rules can batch a dropped row whose body dominates this reduced denominator with kept rows that carry large resource attributes, causing billingByteReductionForDrops to credit most of the Kafka message while the kept rows are still produced and ingested. Base the numerator and denominator on the same byte model as the billed header, or at least include all customer-controlled fields that contribute to the ingested row size.

veria-ai · 2026-06-12T01:02:57Z

PR overview

This PR adds pro-rated log billing behavior for rows removed by drop rules during logs ingestion. The touched sampling logic calculates byte reductions for dropped log rows so billing can account for data excluded by those rules.

There is one open issue in the billing reduction calculation: the byte denominator used for dropped rows excludes several customer-controlled fields while the credit is applied against the full uncompressed message size. A sender in a project using drop rules could craft batches that over-credit dropped rows and reduce billing for kept log data that is still ingested. No issues have been fixed yet, so the PR still carries a concrete billing-integrity risk.

Open issues (1)

Medium: Drop-rule billing credit can include kept payload bytes — nodejs/src/logs-ingestion/sampling/logs-sampling.service.ts:90

Fixed/addressed: 0 · PR risk: 5/10

assign-reviewers-posthog Bot assigned DanielVisca Jun 9, 2026

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread nodejs/src/logs-ingestion/logs-billing.test.ts

DanielVisca force-pushed the posthog-code/logs-billing-prorate-dropped-bytes branch from 3c304db to 6cab882 Compare June 9, 2026 23:42

DanielVisca marked this pull request as ready for review June 10, 2026 00:37

DanielVisca marked this pull request as draft June 10, 2026 00:40

DanielVisca marked this pull request as ready for review June 10, 2026 01:04

DanielVisca added 4 commits June 11, 2026 17:48

DanielVisca force-pushed the posthog-code/logs-billing-prorate-dropped-bytes branch from 4862afb to 04112e9 Compare June 12, 2026 00:53

DanielVisca requested a review from a team June 12, 2026 00:58

DanielVisca added the stamphog Request AI review from stamphog label Jun 12, 2026

stamphog Bot reviewed Jun 12, 2026

View reviewed changes

stamphog Bot removed the stamphog Request AI review from stamphog label Jun 12, 2026

veria-ai Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(logs): pro-rate billing for rows removed by drop rules#62580

feat(logs): pro-rate billing for rows removed by drop rules#62580
DanielVisca wants to merge 4 commits into
masterfrom
posthog-code/logs-billing-prorate-dropped-bytes

DanielVisca commented Jun 9, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

DanielVisca commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

stamphog Bot left a comment

Uh oh!

veria-ai Bot Jun 12, 2026

Uh oh!

veria-ai Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DanielVisca commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Automatic notifications

Docs update

🤖 Agent context

Uh oh!

greptile-apps Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

Uh oh!

DanielVisca commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

stamphog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot commented Jun 12, 2026

PR overview

Open issues (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DanielVisca commented Jun 9, 2026 •

edited

Loading

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading