claude: implement webhook capture development skills

nicolaslazo · nicolaslazo · commit 6221eeae5fae · 2026-04-17T12:37:15.000-03:00
Signed-off-by: nicolaslazo &lt;45973144+nicolaslazo@users.noreply.github.com&gt;
diff --git a/.claude/skills/classify-stream-types/SKILL.md b/.claude/skills/classify-stream-types/SKILL.md
@@ -0,0 +1,103 @@
+---
+name: classify-stream-types
+description: Classify API endpoints into stream types (webhook, incremental, backfill, snapshot) and generate resource definitions for estuary-cdk connectors. Use after scaffolding a connector or when adding new streams.
+argument-hint: "[connector-name]"
+allowed-tools: Bash Read Write Edit Glob Grep WebFetch WebSearch
+---
+
+Classify each API endpoint of `source-$ARGUMENTS` into the appropriate stream type and generate resource definitions. Read the connector's `api.py` and `resources.py` first, then research the provider's API docs.
+
+## Decision flowchart
+
+Evaluate each endpoint in order:
+
+1. **Does the provider push events via HTTP?** → **Webhook** stream (via `WebhookCaptureSpec`). See `/create-webhook-connector` for setup.
+2. **Does the endpoint support date-range or cursor-based filtering with a cursor that persists over time?** (e.g., `updated_at`, monotonic ID, event timestamp, sequence number — not just `created_at`) → **Incremental + Backfill**. Prefer endpoints that also support sorting, but filtering without document sorting is fine.
+3. **No filtering, but supports sorting in reverse chronological order?** → **Incremental only**. Use `fetch_changes` to walk from the latest document backward to the cursor (initially `start_date`). Only yield a cursor checkpoint once the walk reaches the cursor — if interrupted mid-walk, the next invocation restarts from the top. After the initial catch-up, subsequent invocations only walk back to the last checkpoint.
+4. **Is the dataset small with no change tracking?** → **Snapshot**.
+5. **Large dataset, no filtering or sorting?** → Look for a different endpoint or ask the user.
+
+**Always ask the user for confirmation before committing to an incremental-only approach (no backfill).** If there's any usable cursor, prefer incremental + backfill. Only fall back to incremental-only if the user explicitly confirms backfill isn't needed.
+
+## Code patterns
+
+### Incremental + Backfill
+
+Reference: `source-sentry/source_sentry/resources.py` — `open_issue_binding` function.
+
+```python
+def open_binding_fn(
+    binding: CaptureBinding[ResourceConfig],
+    binding_index: int,
+    state: ResourceState,
+    task: Task,
+    all_bindings,
+):
+    common.open_binding(
+        binding,
+        binding_index,
+        state,
+        task,
+        fetch_changes=functools.partial(fetch_incremental, http, ...),
+        fetch_page=functools.partial(backfill_historical, http, ...),
+    )
+
+# Initial state must set both inc and backfill:
+# Convention: next_page=None signals "beginning of backfill".
+# The fetch_page function handles None by falling back to start_date.
+cutoff = datetime.now(tz=UTC)
+initial_state = ResourceState(
+    inc=ResourceState.Incremental(cursor=cutoff),
+    backfill=ResourceState.Backfill(cutoff=cutoff, next_page=None),
+)
+```
+
+### Incremental only
+
+Reference: `source-front/source_front/resources.py` — `incremental_resources_with_cursor_fields` function.
+
+```python
+common.open_binding(
+    binding, binding_index, state, task,
+    fetch_changes=functools.partial(fetch_changes_fn, http, ...),
+)
+
+initial_state = ResourceState(
+    inc=ResourceState.Incremental(cursor=config.start_date),
+)
+```
+
+### Snapshot
+
+Reference: `source-sentry/source_sentry/resources.py` — `open_full_refresh_bindings` function, and `source-front/source_front/resources.py` — `full_refresh_resources` function.
+
+```python
+common.open_binding(
+    binding, binding_index, state, task,
+    fetch_snapshot=functools.partial(list_all, http, ...),
+    tombstone=MyDocument(_meta=MyDocument.Meta(op="d")),
+)
+
+initial_state = ResourceState()  # No cursor needed
+```
+
+### Webhook
+
+Defer to `/create-webhook-connector` skill for webhook stream setup.
+
+## Backfill design guidance
+
+- `fetch_page(log, page_cursor, cutoff)` walks historical data from oldest to cutoff
+- Page cursor tracks progress across the CDK's 24-hour periodic restart
+- Each invocation fetches one page or time window, then yields a `PageCursor` for the next
+- **Checkpoint every N wall-clock minutes** (e.g., 5 min): track elapsed time within `fetch_page` and yield a `PageCursor` checkpoint when the time budget is reached, even mid-sequence. This ensures progress is saved if the connector restarts.
+- Return without yielding a `PageCursor` to signal completion
+- The `cutoff` LogCursor marks where incremental replication takes over — suppress documents at or after the cutoff
+
+## Workflow
+
+1. Read the connector's `api.py` and `resources.py`
+2. Research the provider's API docs (via WebFetch/WebSearch) to understand each endpoint's filtering, pagination, and cursor capabilities
+3. For each endpoint, apply the decision flowchart above
+4. Present the classification to the user for confirmation before generating code
+5. Generate or modify resource definitions in `resources.py` and fetch functions in `api.py`
diff --git a/.claude/skills/create-webhook-connector/SKILL.md b/.claude/skills/create-webhook-connector/SKILL.md
@@ -0,0 +1,110 @@
+---
+name: create-webhook-connector
+description: Scaffold a new webhook-based capture connector using the estuary-cdk webhook framework. Use when the user wants to create a new source connector that receives webhook events.
+argument-hint: "[provider-name]"
+allowed-tools: Bash Read Write Edit Glob Grep
+---
+
+Create a new webhook-based capture connector named `source-$ARGUMENTS` using the estuary-cdk webhook framework. Read `source-appsflyer/` as the canonical reference before scaffolding. See `estuary-cdk/CLAUDE.md` for the standard connector layout.
+
+## Supported discriminators
+
+Choose ONE based on how the provider identifies event types:
+
+**HeaderDiscriminator** — event type is in an HTTP header (highest routing priority):
+
+```python
+from estuary_cdk.capture.webhook.match import HeaderDiscriminator
+HeaderDiscriminator(key="X-Event-Type", known_values=event_types)
+```
+
+**BodyDiscriminator** — event type is in the JSON body (supports dot-paths like `event.type`):
+
+```python
+from estuary_cdk.capture.webhook.match import BodyDiscriminator
+BodyDiscriminator(key="event.type", known_values=event_types)
+```
+
+**UrlDiscriminator** — different events hit different URL paths:
+
+```python
+from estuary_cdk.capture.webhook.match import UrlDiscriminator
+UrlDiscriminator(known_values={"/installs", "/events/{type}"})
+```
+
+No discriminator (or empty `UrlDiscriminator`) creates a single catch-all collection.
+
+**Routing priority:** Header > Body > URL (more literal segments first) > URL wildcard `*`
+
+## Event type discovery
+
+There are two patterns for populating `known_values`:
+
+### Pattern 1: API-discovered (preferred when an endpoint exists)
+
+If the provider has an API to list available event types, fetch them in `all_resources()` and pass directly as `known_values`. See `source-appsflyer` for this pattern — it fetches from the AppsFlyer Push API and uses the response as-is.
+
+```python
+async def all_resources(log, http, config):
+    http.token_source = TokenSource(oauth_spec=None, credentials=config.credentials)
+    event_types = await fetch_event_types(log, http)
+    return WebhookCaptureSpec(
+        discriminator=HeaderDiscriminator(key="X-Event-Type", known_values=event_types),
+    ).create_resources()
+```
+
+### Pattern 2: Static with user override (when no discovery endpoint exists)
+
+Define a predefined set of known event types and expose it in `EndpointConfig.Advanced` so users can modify it:
+
+```python
+KNOWN_EVENT_TYPES = {"install", "uninstall", "in-app-event", "re-engagement"}
+
+class EndpointConfig(BaseModel):
+    class Advanced(BaseModel):
+        event_types: set[str] = Field(
+            title="Event types",
+            description="Event types to accept as valid resources",
+            default=KNOWN_EVENT_TYPES,
+        )
+
+    advanced: Annotated[
+        Advanced,
+        Field(
+            title="Advanced Config",
+            description="Advanced settings for the connector.",
+            default_factory=Advanced,
+            json_schema_extra={"advanced": True, "order": 1},
+        ),
+    ]
+```
+
+Pass whatever the user configured directly as `known_values`:
+
+```python
+async def all_resources(log, http, config):
+    return WebhookCaptureSpec(
+        discriminator=HeaderDiscriminator(
+            key="X-Event-Type",
+            known_values=config.advanced.event_types,
+        ),
+    ).create_resources()
+```
+
+## Constraints
+
+- `known_values` must be non-empty for HeaderDiscriminator and BodyDiscriminator
+- Accepts single JSON objects and arrays; if ANY document in an array fails to match, the entire request returns 404
+- Schema inference is enabled by default — do not define rigid output schemas
+
+## Key CDK files
+
+- `estuary-cdk/estuary_cdk/capture/webhook/server.py` — WebhookCaptureSpec, WebhookResourceConfig
+- `estuary-cdk/estuary_cdk/capture/webhook/match.py` — discriminators and match rules
+- `estuary-cdk/estuary_cdk/capture/common.py` — WebhookDocument, Resource
+
+## After scaffolding
+
+Ask the user which discriminator strategy fits their webhook provider and whether event types can be discovered via API or need to be statically defined.
+
+If the provider also has pull API endpoints (not just webhooks), use `/classify-stream-types` to determine the appropriate stream type for each endpoint.
diff --git a/estuary-cdk/CLAUDE.md b/estuary-cdk/CLAUDE.md
@@ -55,6 +55,8 @@ Study these for common patterns:
 | `source-salesforce-native/`    | OAuth authentication, complex resources  |
 | `source-google-sheets-native/` | Google OAuth, service accounts           |
 | `source-airtable-native/`      | Pagination, nested resources             |
+| `source-appsflyer/`            | Webhook + pull API (incremental)         |
+| `source-sentry/`               | Incremental + backfill, snapshot         |
 
 Note: The `-native` suffix indicates a first-party CDK connector that replaces an existing third-party connector with the same base name. Connectors without the suffix may also be first-party CDK connectors if there was no naming conflict.