feat(metrics): pipe our logs services into metrics1 locally + verify script#63123
feat(metrics): pipe our logs services into metrics1 locally + verify script#63123DanielVisca wants to merge 1 commit into
Conversation
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
bin/verify-metrics-pipe:33-41
**`set -e` silently exits before the troubleshooting message**
With `set -euo pipefail` active, if `ch()` returns a non-zero exit code (e.g., ClickHouse is not running or the container isn't up), the script exits immediately at the `ROWS=$(ch "...")` assignment. The user-friendly `✗ No PostHog service metrics found…` block and the numbered troubleshooting steps are never printed. Wrapping the `ch` call with `|| true` — or using a temporary variable with `|| ROWS=""` — lets the script fall through to the helpful message for connection errors too.
### Issue 2 of 2
bin/verify-metrics-pipe:36-37
**Unvalidated env var interpolated into SQL**
`LOOKBACK` is injected verbatim from the environment into both ClickHouse queries (`INTERVAL ${LOOKBACK}`). A value like `1; DROP TABLE posthog.metrics` would be passed straight to the ClickHouse client. For a local dev script this is low risk, but the same pattern appears twice (lines 36 and 67). Accepting only a numeric-unit pair (e.g., `LOOKBACK_MINUTES`) or at minimum quoting/validating against an allowlist keeps the helper safe even when run by scripts that forward arbitrary env vars.
Reviews (1): Last reviewed commit: "feat(metrics): pipe our logs services in..." | Re-trigger Greptile |
| WHERE timestamp > now() - INTERVAL ${LOOKBACK} | ||
| AND service_name IN ('logs-ingestion', 'metrics-ingestion', 'nodejs') |
There was a problem hiding this comment.
Unvalidated env var interpolated into SQL
LOOKBACK is injected verbatim from the environment into both ClickHouse queries (INTERVAL ${LOOKBACK}). A value like 1; DROP TABLE posthog.metrics would be passed straight to the ClickHouse client. For a local dev script this is low risk, but the same pattern appears twice (lines 36 and 67). Accepting only a numeric-unit pair (e.g., LOOKBACK_MINUTES) or at minimum quoting/validating against an allowlist keeps the helper safe even when run by scripts that forward arbitrary env vars.
Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/verify-metrics-pipe
Line: 36-37
Comment:
**Unvalidated env var interpolated into SQL**
`LOOKBACK` is injected verbatim from the environment into both ClickHouse queries (`INTERVAL ${LOOKBACK}`). A value like `1; DROP TABLE posthog.metrics` would be passed straight to the ClickHouse client. For a local dev script this is low risk, but the same pattern appears twice (lines 36 and 67). Accepting only a numeric-unit pair (e.g., `LOOKBACK_MINUTES`) or at minimum quoting/validating against an allowlist keeps the helper safe even when run by scripts that forward arbitrary env vars.
How can I resolve this? If you propose a fix, please make it concise.|
Size Change: 0 B Total Size: 72.4 MB ℹ️ View Unchanged
|
There was a problem hiding this comment.
Two bot comments on the current head remain unaddressed in the diff: (1) the set -euo pipefail combined with an unguarded ch() call means ClickHouse errors silently exit before printing the troubleshooting steps, and (2) LOOKBACK is interpolated verbatim into SQL queries, a valid concern even in a dev-only script. Additionally, the author is not on @PostHog/team-devex, which owns four of the changed files (otel config, hogli.yaml, dev tooling scripts). Get a review from a team-devex member before re-requesting.
There was a problem hiding this comment.
The author is not on @PostHog/team-devex, which owns four of the changed files (otel collector config, hogli.yaml, and the new dev scripts). These are behavioral changes to the shared dev environment, not cosmetic fixes, so ownership matters. Additionally, two substantive bot review comments on the current head remain unaddressed: the set -euo pipefail silent-exit bug that swallows the user-friendly troubleshooting output, and the LOOKBACK env var being interpolated verbatim into ClickHouse queries.
ce78417 to
e5ca206
Compare
c55c893 to
fccb927
Compare
🕸️ Eager graphHow much code each root forces the browser to download and decode through static imports — the regression class total bundle size can't see.
✅ Largest files eagerly reachable from
|
| Size | File |
|---|---|
| 878.2 KiB | src/styles/global.scss |
| 609.0 KiB | public/hedgehog/burning-money-hog.png |
| 541.9 KiB | public/hedgehog/waving-hog.png |
| 448.2 KiB | public/hedgehog/stop-sign-hog.png |
| 362.0 KiB | public/hedgehog/phone-pair-hogs.png |
| 354.8 KiB | ../node_modules/.pnpm/@posthog+icons@0.36.6_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js |
| 335.6 KiB | public/hedgehog/desk-hog.png |
| 323.2 KiB | public/hedgehog/3-bears-hogs.png |
| 297.4 KiB | src/taxonomy/core-filter-definitions-by-group.json |
| 286.4 KiB | src/lib/api.ts |
Largest files eagerly reachable from src/scenes/AuthenticatedShell.tsx
| Size | File |
|---|---|
| 878.2 KiB | src/styles/global.scss |
| 760.0 KiB | src/queries/validators.js |
| 609.0 KiB | public/hedgehog/burning-money-hog.png |
| 541.9 KiB | public/hedgehog/waving-hog.png |
| 448.2 KiB | public/hedgehog/stop-sign-hog.png |
| 398.7 KiB | ../node_modules/.pnpm/chart.js@4.5.1/node_modules/chart.js/dist/chart.js |
| 362.0 KiB | public/hedgehog/phone-pair-hogs.png |
| 354.8 KiB | ../node_modules/.pnpm/@posthog+icons@0.36.6_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js |
| 335.6 KiB | public/hedgehog/desk-hog.png |
| 323.2 KiB | public/hedgehog/3-bears-hogs.png |
Posted automatically by check-eager-graph · sizes are input-source bytes from the esbuild metafile · part of #32479
…script Local validation path (the alpha priority): get real PostHog logs services flowing through the metrics product on a laptop. - otel-collector-config.dev.yaml: extend the dev collector's prometheus receiver to also scrape logs-ingestion (:6743) and metrics-ingestion (:6744) /_metrics, not just the main plugin server (:6738). Their prom-client counters (logs_ingestion_*, metrics_ingestion_*, lag, batch utilization) are exactly the Grafana logs-dashboard metrics, so this pipes a real service through metrics1 with zero manual setup — the collector already exports OTLP to capture-logs (phc_local -> team 1). - bin/verify-metrics-pipe: checks metrics1 for the piped service metrics and prints what arrived, with ordered troubleshooting on miss. - deployment-layout.md: local runbook updated — services pipe through automatically; validate with the script / SQL tab / Viewer. Validated: collector YAML parses; scrape block structure matches the existing working nodejs job. Live end-to-end run pending a running docker daemon + metrics intent (not available in this session). Generated-By: PostHog Code Task-Id: 97261763-b5cd-4739-98c0-ba2da8637abe
|
Both greptile findings addressed on the new head:
On ownership: the otel-collector config / hogli.yaml / dev-script changes touch the shared dev environment, so requesting @PostHog/team-devex review for those files. |
e5ca206 to
2788464
Compare
2788464 to
dc23e07
Compare
fccb927 to
7452409
Compare
dc23e07 to
ce78417
Compare
7452409 to
fc7da94
Compare
|
Docs from this PR will be published at posthog.com
Preview will be ready in ~10 minutes. Click Preview link above to access docs at |
|
🎭 Playwright report
These issues are not necessarily caused by your changes. |

Problem
Until now nothing in the local dev stack actually flowed into the metrics product —
metrics1sat empty, so every query-layer change was tested only against synthetic fixtures. We want PostHog's own logs-stack services observable in PostHog metrics locally (the same dogfooding the dev/prod vmagent bridge will do), and a one-command way to prove the pipe is alive.Changes
otel-collector-config.dev.yaml: the dev collector now scrapes our own services'/_metricsendpoints —logs-ingestion:6743,metrics-ingestion:6744 (plugin-server :6738 was already there) — and ships them through capture-logs → Kafka → ClickHouse. ~160 real prom-client metrics per service (consumer lag, bytes received, batch sizes…). Note: the scrapejob_namebecomes the row'sservice_name; static labels can't override it.bin/clickhouse-metrics.sql: adds read aliases in theposthogdatabase. The product connection resolvesmetrics/metric_attributes/metrics_kafka_metricsthere, but this script's tables land indefault(docker clickhouse-client default DB) — without the aliases the product reads empty same-named tables while data accumulates elsewhere. Local-dev only; prod has everything inposthogand is unaffected.bin/verify-metrics-pipe(+hogli verify:metrics:pipe): asserts service metrics landed inposthog.metrics(database-qualified — an earlier version passed against the wrong DB) and prints the ingestion-lag metrics.bin/send-dev-metrics.sh(+hogli send:dev:metrics): synthetic OTLP counter/gauge/histogram for testing the push path.How did you test this code?
I'm an agent. With the stack running (
hogli start, metrics intent):All three ingestion modes verified live: scrape, push direct to capture-logs :4320, push via collector OTLP :4318.
🤖 Agent context
Autonomy: Human-driven (agent-assisted) — directed by @DanielVisca
Found and fixed the default-vs-posthog database split-brain while validating:
posthog.metricsalready pointed atdefault.metrics1(so sample reads worked) butmetric_attributes/metrics_kafka_metricsdid not. The verify script was rewritten to query exactly what the product queries so it can't pass against the wrong database again.