You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hogflare is a Cloudflare Workers ingestion layer for PostHog SDKs. It supports PostHog-style ingestion, stateful persons/groups, and SDK feature flags, then streams events into Cloudflare Pipelines so data lands in R2 as Iceberg/Parquet.
5
+
Hogflare is a Cloudflare Workers ingestion layer for PostHog SDKs. It supports PostHog-style ingestion, stateful persons/groups, and SDK feature flags, then streams events and person snapshots into Cloudflare Pipelines so data lands in R2 as Iceberg/Parquet.
EventsPipeline --> EventsR2["R2 Data Catalog<br/>events table"]
41
+
PersonsPipeline --> PersonsR2["R2 Data Catalog<br/>persons table"]
39
42
```
40
43
41
44
## Why?
@@ -48,44 +51,70 @@ Admittedly, PostHog does a *lot* more than this package, but some folks really j
48
51
49
52
## Quick start (Cloudflare)
50
53
51
-
1) Create a Pipeline stream and sink in the Cloudflare dashboard or via `wrangler pipelines setup`.
52
-
2) Use the schema below for the stream.
53
-
3) Copy `wrangler.toml.example` to `wrangler.toml` and set variables.
54
-
4) Set Wrangler secrets.
55
-
5) Deploy the Worker.
54
+
1. Create R2 Data Catalog-backed Pipelines resources.
55
+
2. Copy `wrangler.toml.example` to `wrangler.toml` and set the stream endpoints.
56
+
3. Set Wrangler secrets.
57
+
4. Build and deploy the Worker.
58
+
5. Send a capture/identify verification flow and query the Iceberg tables.
56
59
57
-
### Pipeline schema (JSON)
60
+
The examples below use stable table names for a fresh deployment: `default.hogflare_events` and `default.hogflare_persons`. If you use versioned names during migration, substitute those names consistently in the sink commands and queries.
`R2_CATALOG_TOKEN` is the token used by R2 Data Catalog/R2 SQL clients such as DuckDB or PyIceberg. The bucket must have R2 Data Catalog enabled before creating `r2-data-catalog` sinks.
|`CLOUDFLARE_PIPELINE_AUTH_TOKEN`| Yes, for authenticated streams | Bearer token used for events stream HTTP ingest. |
168
+
|`CLOUDFLARE_PERSONS_PIPELINE_ENDPOINT`| No | Persons stream endpoint. Set this to write person snapshots to Iceberg. |
169
+
|`CLOUDFLARE_PERSONS_PIPELINE_AUTH_TOKEN`| No | Falls back to `CLOUDFLARE_PIPELINE_AUTH_TOKEN` when omitted. |
170
+
|`CLOUDFLARE_PIPELINE_TIMEOUT_SECS`| No | Defaults to 10 seconds. |
171
+
|`POSTHOG_API_KEY`| No | Default project token returned by `/decide` when request/header token is absent. |
172
+
|`POSTHOG_TEAM_ID`| No | Optional team id attached to event and person rows. |
173
+
|`POSTHOG_GROUP_TYPE_0..4`| No | Maps PostHog group types to `group0..group4`; set `POSTHOG_GROUP_TYPE_0=company` to populate `group0` for company groups. |
174
+
|`POSTHOG_SESSION_RECORDING_ENDPOINT`| No | Returned in `/decide` session recording config. |
175
+
|`POSTHOG_SIGNING_SECRET`| No | Enables HMAC request signature checks. |
176
+
|`PERSON_DEBUG_TOKEN`| No | Enables `/__debug/person/:id` for deployment verification. |
177
+
|`HOGFLARE_FEATURE_FLAGS`| No | JSON flag config used by `/decide` and `/flags`. |
178
+
132
179
### Secrets
133
180
181
+
Use a Cloudflare API token that can write to Pipelines for `CLOUDFLARE_PIPELINE_AUTH_TOKEN`. The same token can usually be reused for the persons stream.
182
+
134
183
```bash
135
184
bunx wrangler secret put CLOUDFLARE_PIPELINE_AUTH_TOKEN
185
+
# Optional. If omitted, the persons pipeline uses CLOUDFLARE_PIPELINE_AUTH_TOKEN.
186
+
bunx wrangler secret put CLOUDFLARE_PERSONS_PIPELINE_AUTH_TOKEN
187
+
188
+
# Optional.
136
189
bunx wrangler secret put POSTHOG_SIGNING_SECRET
190
+
bunx wrangler secret put PERSON_DEBUG_TOKEN
191
+
bunx wrangler secret put HOGFLARE_FEATURE_FLAGS
137
192
```
138
193
139
194
### Deploy
140
195
141
196
```bash
197
+
worker-build --release
142
198
bunx wrangler deploy
143
199
```
144
200
145
-
## Send a test event
201
+
## Verify Deployment
146
202
147
203
```bash
148
-
curl -X POST https://<your-worker>.workers.dev/capture \
Expected result: the three event rows share one `person_id`, and the persons table has `capture`, `identify`, `capture` snapshots. After identify, `distinct_ids` should include both the anonymous and identified IDs.
275
+
160
276
## HMAC signing (optional)
161
277
162
278
If `POSTHOG_SIGNING_SECRET` is set, requests must include a valid signature.
@@ -227,6 +343,7 @@ docker compose up --build -d fake-pipeline
@@ -276,7 +429,7 @@ Identify, capture `$set` / `$set_once` / `$unset`, and alias events update a per
276
429
-`person_created_at`
277
430
-`person_properties`
278
431
279
-
Person DO state is not written to R2. Only event-level snapshots are stored in the pipeline sink.
432
+
The Durable Object is the source of truth for the current person record. When `CLOUDFLARE_PERSONS_PIPELINE_ENDPOINT` is configured, Hogflare also writes append-only person snapshots to the persons pipeline so the state is queryable in Iceberg.
280
433
281
434
### Groups
282
435
@@ -403,3 +556,26 @@ Each row is a `PipelineEvent` with these columns:
403
556
|`group_properties`| JSON (by group type) |
404
557
|`api_key`| string |
405
558
|`extra`| JSON |
559
+
560
+
## Person shape in R2
561
+
562
+
Each row is a `PersonPipelineRecord` snapshot with these columns:
0 commit comments