Skip to content

Commit 7502633

Browse files
sfc-gh-pvillardCortex Code
andcommitted
Update wiki documentation for current repo state
- Rewrite Flow Deploy section: ephemeral CI runtimes, test YAML config, provision/teardown lifecycle, Check Run blocking - Add Postgres CDC Demo flow page - Update Flows index and sidebar with data-generator bucket .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code <noreply@snowflake.com>
1 parent 544076a commit 7502633

4 files changed

Lines changed: 95 additions & 13 deletions

File tree

wiki/Flows--Postgres-CDC-Demo.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Postgres CDC Demo — Data Generator
2+
3+
**Bucket:** `data-generator`
4+
**File:** [`flows/data-generator/postgres-cdc-demo.json`](../flows/data-generator/postgres-cdc-demo.json)
5+
6+
Simulates random data generation with INSERTs, UPDATEs, and DELETEs across three tables (`customers`, `orders`, `order_items`). Designed to be used in combination with the CDC PostgreSQL Connector to demonstrate change data capture pipelines.
7+
8+
---
9+
10+
## Purpose
11+
12+
This flow provides a continuous stream of realistic database mutations against a PostgreSQL instance. It automatically creates the required schema, tables, and publication on first run — no manual SQL setup is needed.
13+
14+
## Components
15+
16+
| Component | Type | Description |
17+
|-----------|------|-------------|
18+
| ExecuteScript (init) | Processor | Creates schema, tables, and publication on first run |
19+
| GenerateFlowFile | Processor | Triggers periodic data generation |
20+
| ExecuteSQL | Processor | Runs randomised INSERT/UPDATE/DELETE statements |
21+
22+
## Required NARs
23+
24+
- `org.apache.nifi:nifi-standard-nar:2.8.0` — included with standard NiFi installations
25+
- PostgreSQL JDBC driver (`postgresql-42.7.10.jar`) — uploaded as a Parameter Context Asset
26+
27+
## Parameters
28+
29+
| Parameter | Description |
30+
|-----------|-------------|
31+
| `Database Connection URL` | JDBC URL to the Postgres instance |
32+
| `Database Name` | Postgres database name |
33+
| `Database User` | Postgres username |
34+
| `Database Password` | Postgres password (sensitive — use parameter provider) |
35+
| `Schema Name` | Schema for the generated tables |
36+
| `Database Driver` | JDBC driver asset (bound to the uploaded JAR) |
37+
38+
## Configuration
39+
40+
Deploy via the Environment CD pipeline using an `environments/<env>/config.yaml` entry, or test in CI using the test YAML at `flows/data-generator/tests/test_postgres_cdc_demo.yaml`.
41+
42+
For secrets (`Database Connection URL`, `Database Password`), use the auto-provisioned Snowflake Parameter Provider with `#{PARAM_NAME}` references.
43+
44+
## Expected Behaviour
45+
46+
Once started, the flow continuously generates random INSERT, UPDATE, and DELETE operations against the three tables. The PostgreSQL publication enables downstream CDC connectors to capture these changes in real time.
47+
48+
## Validation Tests
49+
50+
The test file at [`flows/data-generator/tests/test_postgres_cdc_demo.py`](../flows/data-generator/tests/test_postgres_cdc_demo.py) validates the flow deploys correctly and processes data against a live PostgreSQL instance provisioned via the ephemeral CI runtime.

wiki/Flows.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ See [How to Use This Repo](How-to-Use-This-Repo) for instructions on importing o
1414
|------|-------------|
1515
| [Hello World](Flows--Hello-World) | Minimal example demonstrating the NiFi Hub flow structure. Generates a FlowFile and logs its attributes. |
1616

17+
### Data Generator
18+
19+
| Flow | Description |
20+
|------|-------------|
21+
| [Postgres CDC Demo](Flows--Postgres-CDC-Demo) | Simulates random data generation (INSERTs, UPDATEs, DELETEs) across multiple tables for use with the CDC PostgreSQL Connector. |
22+
1723
---
1824

1925
## Flow Structure

wiki/Introduction-and-Concepts--CD.md

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# CD Pipeline
22

3-
NiFi Hub has two CD mechanisms: **Flow Deploy** for testing flows against a live Snowflake runtime during PR review, and **Environment CD** for managing Openflow infrastructure declaratively as code.
3+
NiFi Hub has two CD mechanisms: **Flow Deploy** for testing flows against ephemeral Snowflake runtimes during PR review, and **Environment CD** for managing Openflow infrastructure declaratively as code.
44

55
---
66

@@ -9,22 +9,47 @@ NiFi Hub has two CD mechanisms: **Flow Deploy** for testing flows against a live
99
**Workflow:** `flow-deploy.yml`
1010
**Trigger:** A maintainer with admin or maintain permission comments `deploy this flow` on a PR
1111

12-
This workflow lets maintainers test a flow against a real Snowflake Openflow runtime before merging. It is intended as a validation step during PR review, not for production deployment.
12+
This workflow lets maintainers test a flow against a real Snowflake Openflow runtime before merging. It provisions an **ephemeral runtime**, deploys the flow, runs tests, and tears everything down automatically.
1313

1414
### What Happens
1515

1616
1. The workflow identifies flow JSON files changed in the PR
17-
2. **Builds all extension bundles** and uploads the resulting NARs to the target runtime
18-
3. **Deploys each changed flow** as a process group on the runtime, using the runtime's REST API
19-
4. **Runs the flow's validation tests** (`flows/<bucket>/tests/test_<flow-name>.py`) against the deployed process group
20-
5. **Cleans up** the deployed process group and uploaded NARs (unless the comment includes "do not clean")
21-
6. **Posts a comment** on the PR with deployment details, processor/controller service summary, and per-test results
22-
23-
### Configuration
24-
25-
The target runtime is configured via a GitHub Environment named `snowflake-runtime-ci`, which provides:
26-
- `SNOWFLAKE_RUNTIME_URL` — the Openflow runtime endpoint
27-
- `SNOWFLAKE_RUNTIME_PAT` — a PAT with permission to deploy to the runtime
17+
2. For each changed flow that has a test YAML (`flows/<bucket>/tests/test_<flow-name>.yaml`):
18+
- **Provisions an ephemeral runtime** named `CI_<FLOW>_<PR>_<RUN_ID>` with the configuration from the test YAML (node type, network rules, registries, etc.)
19+
- **Uploads custom NARs** from GitHub Releases if specified in the test YAML's `nars` field
20+
- **Deploys the flow** from the PR branch as a process group on the runtime
21+
- **Applies parameters and assets** from the test YAML's `flow` section
22+
- **Fetches the auto-provisioned Snowflake Parameter Provider** to inject secrets from Snowflake
23+
- **Waits** for the flow to process data (60 seconds)
24+
- **Runs validation tests** (`flows/<bucket>/tests/test_<flow-name>.py`) against the deployed process group
25+
- **Tears down** the ephemeral runtime (unless the comment includes "do not clean")
26+
3. **Posts a comment** on the PR with per-test results and failure details
27+
4. **Creates a Check Run** that blocks the PR merge if tests fail
28+
29+
### Test YAML Configuration
30+
31+
Each flow that needs CI testing has a `flows/<bucket>/tests/test_<flow-name>.yaml` file following the [CI runtime schema](../scripts/ci/ci-runtime-schema.json). Key fields:
32+
33+
| Field | Description |
34+
|-------|-------------|
35+
| `github_environment` | GitHub Environment holding Snowflake secrets |
36+
| `deployment` | Existing Openflow deployment to create the ephemeral runtime in |
37+
| `database` / `schema` | Where the runtime is scoped |
38+
| `node_type` | Runtime size (`SMALL`, `MEDIUM`, `LARGE`) |
39+
| `execute_as_role` | Snowflake role for runtime data access |
40+
| `network_rules` | Egress rules needed by the flow under test |
41+
| `flow_registries` | Git registry clients to configure |
42+
| `nars` | Custom NARs to upload (URLs or `${GITHUB_RELEASES}/` references) |
43+
| `sensitive_param_pattern` | Regex for classifying auto-provisioned provider parameters as sensitive |
44+
| `flow.parameters` | Parameter values to apply after deployment |
45+
| `flow.assets` | Files to download and bind as parameter context assets |
46+
47+
### GitHub Environment Secrets
48+
49+
The GitHub Environment referenced by `github_environment` must provide:
50+
- `SNOWFLAKE_PAT` — Snowflake PAT for provisioning the ephemeral runtime
51+
- `NIFI_RUNTIME_PAT` — PAT for NiFi REST API operations
52+
- `NIFIHUB_REGISTRY_PAT` — GitHub PAT for the Flow Registry Client
2853

2954
---
3055

wiki/_Sidebar.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,4 @@
2424

2525
**[Flows](Flows)**
2626
- [Hello World](Flows--Hello-World)
27+
- [Postgres CDC Demo](Flows--Postgres-CDC-Demo)

0 commit comments

Comments
 (0)