Skip to content

Commit 6221eea

Browse files
committed
claude: implement webhook capture development skills
Signed-off-by: nicolaslazo <45973144+nicolaslazo@users.noreply.github.com>
1 parent 61a31a9 commit 6221eea

File tree

3 files changed

+215
-0
lines changed

3 files changed

+215
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
name: classify-stream-types
3+
description: Classify API endpoints into stream types (webhook, incremental, backfill, snapshot) and generate resource definitions for estuary-cdk connectors. Use after scaffolding a connector or when adding new streams.
4+
argument-hint: "[connector-name]"
5+
allowed-tools: Bash Read Write Edit Glob Grep WebFetch WebSearch
6+
---
7+
8+
Classify each API endpoint of `source-$ARGUMENTS` into the appropriate stream type and generate resource definitions. Read the connector's `api.py` and `resources.py` first, then research the provider's API docs.
9+
10+
## Decision flowchart
11+
12+
Evaluate each endpoint in order:
13+
14+
1. **Does the provider push events via HTTP?****Webhook** stream (via `WebhookCaptureSpec`). See `/create-webhook-connector` for setup.
15+
2. **Does the endpoint support date-range or cursor-based filtering with a cursor that persists over time?** (e.g., `updated_at`, monotonic ID, event timestamp, sequence number — not just `created_at`) → **Incremental + Backfill**. Prefer endpoints that also support sorting, but filtering without document sorting is fine.
16+
3. **No filtering, but supports sorting in reverse chronological order?****Incremental only**. Use `fetch_changes` to walk from the latest document backward to the cursor (initially `start_date`). Only yield a cursor checkpoint once the walk reaches the cursor — if interrupted mid-walk, the next invocation restarts from the top. After the initial catch-up, subsequent invocations only walk back to the last checkpoint.
17+
4. **Is the dataset small with no change tracking?****Snapshot**.
18+
5. **Large dataset, no filtering or sorting?** → Look for a different endpoint or ask the user.
19+
20+
**Always ask the user for confirmation before committing to an incremental-only approach (no backfill).** If there's any usable cursor, prefer incremental + backfill. Only fall back to incremental-only if the user explicitly confirms backfill isn't needed.
21+
22+
## Code patterns
23+
24+
### Incremental + Backfill
25+
26+
Reference: `source-sentry/source_sentry/resources.py``open_issue_binding` function.
27+
28+
```python
29+
def open_binding_fn(
30+
binding: CaptureBinding[ResourceConfig],
31+
binding_index: int,
32+
state: ResourceState,
33+
task: Task,
34+
all_bindings,
35+
):
36+
common.open_binding(
37+
binding,
38+
binding_index,
39+
state,
40+
task,
41+
fetch_changes=functools.partial(fetch_incremental, http, ...),
42+
fetch_page=functools.partial(backfill_historical, http, ...),
43+
)
44+
45+
# Initial state must set both inc and backfill:
46+
# Convention: next_page=None signals "beginning of backfill".
47+
# The fetch_page function handles None by falling back to start_date.
48+
cutoff = datetime.now(tz=UTC)
49+
initial_state = ResourceState(
50+
inc=ResourceState.Incremental(cursor=cutoff),
51+
backfill=ResourceState.Backfill(cutoff=cutoff, next_page=None),
52+
)
53+
```
54+
55+
### Incremental only
56+
57+
Reference: `source-front/source_front/resources.py``incremental_resources_with_cursor_fields` function.
58+
59+
```python
60+
common.open_binding(
61+
binding, binding_index, state, task,
62+
fetch_changes=functools.partial(fetch_changes_fn, http, ...),
63+
)
64+
65+
initial_state = ResourceState(
66+
inc=ResourceState.Incremental(cursor=config.start_date),
67+
)
68+
```
69+
70+
### Snapshot
71+
72+
Reference: `source-sentry/source_sentry/resources.py``open_full_refresh_bindings` function, and `source-front/source_front/resources.py``full_refresh_resources` function.
73+
74+
```python
75+
common.open_binding(
76+
binding, binding_index, state, task,
77+
fetch_snapshot=functools.partial(list_all, http, ...),
78+
tombstone=MyDocument(_meta=MyDocument.Meta(op="d")),
79+
)
80+
81+
initial_state = ResourceState() # No cursor needed
82+
```
83+
84+
### Webhook
85+
86+
Defer to `/create-webhook-connector` skill for webhook stream setup.
87+
88+
## Backfill design guidance
89+
90+
- `fetch_page(log, page_cursor, cutoff)` walks historical data from oldest to cutoff
91+
- Page cursor tracks progress across the CDK's 24-hour periodic restart
92+
- Each invocation fetches one page or time window, then yields a `PageCursor` for the next
93+
- **Checkpoint every N wall-clock minutes** (e.g., 5 min): track elapsed time within `fetch_page` and yield a `PageCursor` checkpoint when the time budget is reached, even mid-sequence. This ensures progress is saved if the connector restarts.
94+
- Return without yielding a `PageCursor` to signal completion
95+
- The `cutoff` LogCursor marks where incremental replication takes over — suppress documents at or after the cutoff
96+
97+
## Workflow
98+
99+
1. Read the connector's `api.py` and `resources.py`
100+
2. Research the provider's API docs (via WebFetch/WebSearch) to understand each endpoint's filtering, pagination, and cursor capabilities
101+
3. For each endpoint, apply the decision flowchart above
102+
4. Present the classification to the user for confirmation before generating code
103+
5. Generate or modify resource definitions in `resources.py` and fetch functions in `api.py`
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
name: create-webhook-connector
3+
description: Scaffold a new webhook-based capture connector using the estuary-cdk webhook framework. Use when the user wants to create a new source connector that receives webhook events.
4+
argument-hint: "[provider-name]"
5+
allowed-tools: Bash Read Write Edit Glob Grep
6+
---
7+
8+
Create a new webhook-based capture connector named `source-$ARGUMENTS` using the estuary-cdk webhook framework. Read `source-appsflyer/` as the canonical reference before scaffolding. See `estuary-cdk/CLAUDE.md` for the standard connector layout.
9+
10+
## Supported discriminators
11+
12+
Choose ONE based on how the provider identifies event types:
13+
14+
**HeaderDiscriminator** — event type is in an HTTP header (highest routing priority):
15+
16+
```python
17+
from estuary_cdk.capture.webhook.match import HeaderDiscriminator
18+
HeaderDiscriminator(key="X-Event-Type", known_values=event_types)
19+
```
20+
21+
**BodyDiscriminator** — event type is in the JSON body (supports dot-paths like `event.type`):
22+
23+
```python
24+
from estuary_cdk.capture.webhook.match import BodyDiscriminator
25+
BodyDiscriminator(key="event.type", known_values=event_types)
26+
```
27+
28+
**UrlDiscriminator** — different events hit different URL paths:
29+
30+
```python
31+
from estuary_cdk.capture.webhook.match import UrlDiscriminator
32+
UrlDiscriminator(known_values={"/installs", "/events/{type}"})
33+
```
34+
35+
No discriminator (or empty `UrlDiscriminator`) creates a single catch-all collection.
36+
37+
**Routing priority:** Header > Body > URL (more literal segments first) > URL wildcard `*`
38+
39+
## Event type discovery
40+
41+
There are two patterns for populating `known_values`:
42+
43+
### Pattern 1: API-discovered (preferred when an endpoint exists)
44+
45+
If the provider has an API to list available event types, fetch them in `all_resources()` and pass directly as `known_values`. See `source-appsflyer` for this pattern — it fetches from the AppsFlyer Push API and uses the response as-is.
46+
47+
```python
48+
async def all_resources(log, http, config):
49+
http.token_source = TokenSource(oauth_spec=None, credentials=config.credentials)
50+
event_types = await fetch_event_types(log, http)
51+
return WebhookCaptureSpec(
52+
discriminator=HeaderDiscriminator(key="X-Event-Type", known_values=event_types),
53+
).create_resources()
54+
```
55+
56+
### Pattern 2: Static with user override (when no discovery endpoint exists)
57+
58+
Define a predefined set of known event types and expose it in `EndpointConfig.Advanced` so users can modify it:
59+
60+
```python
61+
KNOWN_EVENT_TYPES = {"install", "uninstall", "in-app-event", "re-engagement"}
62+
63+
class EndpointConfig(BaseModel):
64+
class Advanced(BaseModel):
65+
event_types: set[str] = Field(
66+
title="Event types",
67+
description="Event types to accept as valid resources",
68+
default=KNOWN_EVENT_TYPES,
69+
)
70+
71+
advanced: Annotated[
72+
Advanced,
73+
Field(
74+
title="Advanced Config",
75+
description="Advanced settings for the connector.",
76+
default_factory=Advanced,
77+
json_schema_extra={"advanced": True, "order": 1},
78+
),
79+
]
80+
```
81+
82+
Pass whatever the user configured directly as `known_values`:
83+
84+
```python
85+
async def all_resources(log, http, config):
86+
return WebhookCaptureSpec(
87+
discriminator=HeaderDiscriminator(
88+
key="X-Event-Type",
89+
known_values=config.advanced.event_types,
90+
),
91+
).create_resources()
92+
```
93+
94+
## Constraints
95+
96+
- `known_values` must be non-empty for HeaderDiscriminator and BodyDiscriminator
97+
- Accepts single JSON objects and arrays; if ANY document in an array fails to match, the entire request returns 404
98+
- Schema inference is enabled by default — do not define rigid output schemas
99+
100+
## Key CDK files
101+
102+
- `estuary-cdk/estuary_cdk/capture/webhook/server.py` — WebhookCaptureSpec, WebhookResourceConfig
103+
- `estuary-cdk/estuary_cdk/capture/webhook/match.py` — discriminators and match rules
104+
- `estuary-cdk/estuary_cdk/capture/common.py` — WebhookDocument, Resource
105+
106+
## After scaffolding
107+
108+
Ask the user which discriminator strategy fits their webhook provider and whether event types can be discovered via API or need to be statically defined.
109+
110+
If the provider also has pull API endpoints (not just webhooks), use `/classify-stream-types` to determine the appropriate stream type for each endpoint.

estuary-cdk/CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ Study these for common patterns:
5555
| `source-salesforce-native/` | OAuth authentication, complex resources |
5656
| `source-google-sheets-native/` | Google OAuth, service accounts |
5757
| `source-airtable-native/` | Pagination, nested resources |
58+
| `source-appsflyer/` | Webhook + pull API (incremental) |
59+
| `source-sentry/` | Incremental + backfill, snapshot |
5860

5961
Note: The `-native` suffix indicates a first-party CDK connector that replaces an existing third-party connector with the same base name. Connectors without the suffix may also be first-party CDK connectors if there was no naming conflict.
6062

0 commit comments

Comments
 (0)