diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 00000000000..72ff9e241fd --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,67 @@ + + +## What is the purpose of this pull request? + +Briefly describe the problem this PR fixes or the feature it introduces. Reference Flink JIRA ticket when possible. + +## Brief change log + +List the key changes in this PR. + +--- + +## Verifying this change + + + +This change is a trivial rework / code cleanup without any test coverage. + +*(or)* + +This change added tests and can be verified as follows: + +- *Added/Updated unit tests in `...*`* +- *Added/Updated integration tests in `...*`* +- *Manually tested by ...* + +## Documentation + +- Does this pull request introduce a new feature? (yes / no) +- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) + +--- + +##### Was generative AI tooling used to co-author this PR? + + + +- [ ] Yes (please specify the tool below) + + diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000000..b50c262030f --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,158 @@ + + +# Flink CDC AI Agent Instructions + +This file provides guidance for AI coding agents working with the Apache Flink CDC. + +## Prerequisites + +- Java 11 (baseline; all code must compile and run on Java 11) and Java 17 +- Maven 3.8.6 or higher +- Git +- Unix-like environment (Linux, macOS, WSL) +- Docker (required for running integration tests and e2e tests) + +## Commands + +### Build + +- Fast dev build (skip tests and format checks): `mvn clean install -DskipTests` +- Full build against Flink 1.x (default): `mvn clean package -DskipTests` +- Full build against Flink 2.x: `mvn clean package -DskipTests -Pflink2` +- Single module (e.g. `flink-cdc-common`): `mvn clean package -DskipTests -pl flink-cdc-common -am` + +### Testing + +- Run all tests for a module: `mvn verify -pl flink-cdc-common` +- Run all tests against Flink 2.x: `mvn verify -pl flink-cdc-common -Pflink2` +- Run a single test class: `mvn -pl flink-cdc-common -Dtest=MyTest test` +- Run a single test method: `mvn -pl flink-cdc-common -Dtest=MyTest#myMethod test` + +### Code Quality + +- Before committing changes, run `mvn spotless:apply` and `mvn spotless:check` to enforce code style rules. +- Make sure newly added files have proper ASF license headers. + +## Repository Structure + +Flink CDC is organized around the Pipeline abstraction: a user-defined data pipeline that reads from one or more sources, optionally transforms records, and writes to one or more sinks. The core modules implement this abstraction; the connector modules provide the concrete source and sink implementations. + +### Core Modules + +- `flink-cdc-common` — Shared API and data model used across all modules: CDC event types (`DataChangeEvent`, `SchemaChangeEvent`, etc.), the schema model, data types, source/sink interfaces, `Factory` SPI, route definitions, UDF interfaces, and utility classes. Most new abstractions start here. +- `flink-cdc-runtime` — Runtime implementation of the Pipeline: operators for reading, routing, transforming (expression evaluation via Calcite + Janino), and writing CDC events. +- `flink-cdc-composer` — Pipeline assembly and deployment layer. Translates a `PipelineDefinition` into a runnable Flink job, wiring sources, operators, and sinks together. Supports Flink-native, Kubernetes, and YARN deployment. +- `flink-cdc-cli` — Command-line entry point (`flink-cdc.sh`). Parses YAML pipeline definitions and delegates to `flink-cdc-composer`. +- `flink-cdc-dist` — Distribution packaging. Produces the `flink-cdc--bin` release archive. + +### Flink Version Compatibility + +Currently, Flink CDC supports two Flink generations simultaneously: + +- `flink-cdc-flink1-compat` — Flink 1.x compatibility layer (currently 1.20.3). Default profile. +- `flink-cdc-flink2-compat` — Flink 2.x compatibility layer (currently 2.2.0). Activated via `-Pflink2`. + +All modules that depend on Flink APIs must declare their Flink dependencies as `provided` and reference `${flink.version}`, which is resolved by the active profile. +Please verify your changes in both Flink 1.20 (LTS) and Flink 2.x (latest). + +### Connectors (`flink-cdc-connect/`) + +Connectors are split into two categories: + +- Source Connectors (`flink-cdc-connect/flink-cdc-source-connectors/`) are CDC sources for DataStream and Flink SQL jobs. +- Pipeline Connectors (`flink-cdc-connect/flink-cdc-pipeline-connectors/`) are Pipeline connectors for the YAML API. + +### Tests + +Write unit tests and integration tests in the corresponding submodules. End-to-end tests are located in these modules: + +- `flink-cdc-e2e-tests/` — End-to-end tests parent module. + - `flink-cdc-e2e-utils` — Shared test utilities (container management, assertions) + - `flink-cdc-source-e2e-tests` — E2E tests for source connectors + - `flink-cdc-pipeline-e2e-tests` — E2E tests for pipeline connectors + +### Documentation + +- `docs/` — Hugo-based documentation site. `docs/content/` for English, `docs/content.zh/` for Chinese. Update both when adding new features. + +## Coding Standards + +- **Format Java files with Spotless before every commit:** `mvn spotless:apply`. +- **Import order** (enforced by Checkstyle): `org.apache.flink.cdc` → `org.apache.flink` → other third-party → `javax` → `java`. Static imports go last. No star imports. +- **Forbidden imports** (enforced by Checkstyle): + - JUnit 4 (`org.junit.*` except `org.junit.jupiter.*`) — use JUnit 5 Jupiter instead + - `org.junit.jupiter.api.Assertions` and `org.hamcrest` — use AssertJ instead + - `com.google.common.*` — use `flink-shaded-guava` instead + - `com.google.common.base.Preconditions` — use Flink CDC's `Preconditions` + - `com.google.common.annotations.VisibleForTesting` — use `@VisibleForTesting` from `org.apache.flink.cdc.common.annotation` +- **API stability annotations:** Every user-facing class and method must carry one of the annotations from `org.apache.flink.cdc.common.annotation`: + - `@Public` — stable across major versions + - `@PublicEvolving` — may change in minor versions + - `@Experimental` — may change at any time + - `@Internal` — no stability guarantee; users should not depend on it +- **Logging:** Use SLF4J with parameterized placeholders (`LOG.info("foo {}", bar)`), never string concatenation. +- **Braces:** Always use braces for `if`/`else`/`for`/`while`/`do` blocks. +- **Comments:** All comments should be written in English, and keep it concise. Only write comments when necessary. Avoid trivial `@param`, `@return` comments in JavaDocs. +- Apache License 2.0 header required on all new files. + +## Testing Standards + +- Use **JUnit 5** (`org.junit.jupiter`) and **AssertJ** (`org.assertj.core.api.Assertions`) for all new tests. Do not use JUnit 4 or Hamcrest. +- Test classes are **package-private** (no `public` modifier on the class). +- **Unit tests** are named `*Test.java` (e.g., `SchemaUtilsTest`). +- **Integration tests** (requiring Docker / real databases) are named `*ITCase.java` (e.g., `MySqlSourceITCase`). +- Add tests for new behavior, covering success, failure, and edge cases. Tests should be self-explanatory; avoid verbose comments in test classes. +- For bug fixes, verify the new test fails without the fix before confirming it passes with it. + +## Commits and PRs + +### Commit message format + +- `[FLINK-XXXX][component] Description` where `FLINK-XXXX` is the JIRA issue number and `component` is the affected area (e.g. `connect/mysql`, `pipeline-connector/kafka`, `docs`, `runtime`) +- `[hotfix][component] ...` or `[docs][component] ...` for trivial fixes without a JIRA issue +- `[ci] Description` for CI-only changes + +### Pull request conventions + +- Title format mirrors the commit message: `[FLINK-XXXX][component] Title` +- A corresponding JIRA issue is required (except trivial changes) +- Fill out the PR template completely: describe purpose, change log, testing approach, and documentation impact +- Ensure CI passes before requesting review +- Enable GitHub Actions on your fork before opening a PR +- When AI tools were used, check the AI disclosure checkbox and uncomment the `Generated-by` line in the PR template, per [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) + +## Boundaries + +### Ask first + +- Adding or changing `@Public` or `@PublicEvolving` annotations (user-facing API commitments) +- New dependencies +- Large cross-module refactors +- Changes to serialization formats or checkpoint behavior +- Changes to hot paths that could impact performance (per-record processing, state access) + +### Never + +- Commit secrets, credentials, or tokens +- Push directly to `apache/flink-cdc`; always work from your fork +- Mix unrelated changes into one PR +- Use JUnit 4 or Hamcrest in new test code +- Use `org.junit.jupiter.api.Assertions`; use AssertJ instead +- Add `Co-Authored-By` with an AI agent in commit messages; use `Generated-by: ` instead \ No newline at end of file