This file provides guidance for AI coding agents working with the Apache Flink CDC.
- Java 11 (baseline; all code must compile and run on Java 11) and Java 17
- Maven 3.8.6 or higher
- Git
- Unix-like environment (Linux, macOS, WSL)
- Docker (required for running integration tests and e2e tests)
- Fast dev build (skip tests and format checks):
mvn clean install -DskipTests - Full build against Flink 1.x (default):
mvn clean package -DskipTests - Full build against Flink 2.x:
mvn clean package -DskipTests -Pflink2 - Single module (e.g.
flink-cdc-common):mvn clean package -DskipTests -pl flink-cdc-common -am
- Run all tests for a module:
mvn verify -pl flink-cdc-common - Run all tests against Flink 2.x:
mvn verify -pl flink-cdc-common -Pflink2 - Run a single test class:
mvn -pl flink-cdc-common -Dtest=MyTest test - Run a single test method:
mvn -pl flink-cdc-common -Dtest=MyTest#myMethod test
- Before committing changes, run
mvn spotless:applyandmvn spotless:checkto enforce code style rules. - Make sure newly added files have proper ASF license headers.
Flink CDC is organized around the Pipeline abstraction: a user-defined data pipeline that reads from one or more sources, optionally transforms records, and writes to one or more sinks. The core modules implement this abstraction; the connector modules provide the concrete source and sink implementations.
flink-cdc-common— Shared API and data model used across all modules: CDC event types (DataChangeEvent,SchemaChangeEvent, etc.), the schema model, data types, source/sink interfaces,FactorySPI, route definitions, UDF interfaces, and utility classes. Most new abstractions start here.flink-cdc-runtime— Runtime implementation of the Pipeline: operators for reading, routing, transforming (expression evaluation via Calcite + Janino), and writing CDC events.flink-cdc-composer— Pipeline assembly and deployment layer. Translates aPipelineDefinitioninto a runnable Flink job, wiring sources, operators, and sinks together. Supports Flink-native, Kubernetes, and YARN deployment.flink-cdc-cli— Command-line entry point (flink-cdc.sh). Parses YAML pipeline definitions and delegates toflink-cdc-composer.flink-cdc-dist— Distribution packaging. Produces theflink-cdc-<version>-binrelease archive.
Currently, Flink CDC supports two Flink generations simultaneously:
flink-cdc-flink1-compat— Flink 1.x compatibility layer (currently 1.20.3). Default profile.flink-cdc-flink2-compat— Flink 2.x compatibility layer (currently 2.2.0). Activated via-Pflink2.
All modules that depend on Flink APIs must declare their Flink dependencies as provided and reference ${flink.version}, which is resolved by the active profile.
Please verify your changes in both Flink 1.20 (LTS) and Flink 2.x (latest).
Connectors are split into two categories:
- Source Connectors (
flink-cdc-connect/flink-cdc-source-connectors/) are CDC sources for DataStream and Flink SQL jobs. - Pipeline Connectors (
flink-cdc-connect/flink-cdc-pipeline-connectors/) are Pipeline connectors for the YAML API.
Write unit tests and integration tests in the corresponding submodules. End-to-end tests are located in these modules:
flink-cdc-e2e-tests/— End-to-end tests parent module.flink-cdc-e2e-utils— Shared test utilities (container management, assertions)flink-cdc-source-e2e-tests— E2E tests for source connectorsflink-cdc-pipeline-e2e-tests— E2E tests for pipeline connectors
docs/— Hugo-based documentation site.docs/content/for English,docs/content.zh/for Chinese. Update both when adding new features.
- Format Java files with Spotless before every commit:
mvn spotless:apply. - Import order (enforced by Checkstyle):
org.apache.flink.cdc→org.apache.flink→ other third-party →javax→java. Static imports go last. No star imports. - Forbidden imports (enforced by Checkstyle):
- JUnit 4 (
org.junit.*exceptorg.junit.jupiter.*) — use JUnit 5 Jupiter instead org.junit.jupiter.api.Assertionsandorg.hamcrest— use AssertJ insteadcom.google.common.*— useflink-shaded-guavainsteadcom.google.common.base.Preconditions— use Flink CDC'sPreconditionscom.google.common.annotations.VisibleForTesting— use@VisibleForTestingfromorg.apache.flink.cdc.common.annotation
- JUnit 4 (
- API stability annotations: Every user-facing class and method must carry one of the annotations from
org.apache.flink.cdc.common.annotation:@Public— stable across major versions@PublicEvolving— may change in minor versions@Experimental— may change at any time@Internal— no stability guarantee; users should not depend on it
- Logging: Use SLF4J with parameterized placeholders (
LOG.info("foo {}", bar)), never string concatenation. - Braces: Always use braces for
if/else/for/while/doblocks. - Comments: All comments should be written in English, and keep it concise. Only write comments when necessary. Avoid trivial
@param,@returncomments in JavaDocs. - Apache License 2.0 header required on all new files.
- Use JUnit 5 (
org.junit.jupiter) and AssertJ (org.assertj.core.api.Assertions) for all new tests. Do not use JUnit 4 or Hamcrest. - Test classes are package-private (no
publicmodifier on the class). - Unit tests are named
*Test.java(e.g.,SchemaUtilsTest). - Integration tests (requiring Docker / real databases) are named
*ITCase.java(e.g.,MySqlSourceITCase). - Add tests for new behavior, covering success, failure, and edge cases. Tests should be self-explanatory; avoid verbose comments in test classes.
- For bug fixes, verify the new test fails without the fix before confirming it passes with it.
[FLINK-XXXX][component] DescriptionwhereFLINK-XXXXis the JIRA issue number andcomponentis the affected area (e.g.connect/mysql,pipeline-connector/kafka,docs,runtime)[hotfix][component] ...or[docs][component] ...for trivial fixes without a JIRA issue[ci] Descriptionfor CI-only changes
- Title format mirrors the commit message:
[FLINK-XXXX][component] Title - A corresponding JIRA issue is required (except trivial changes)
- Fill out the PR template completely: describe purpose, change log, testing approach, and documentation impact
- Ensure CI passes before requesting review
- Enable GitHub Actions on your fork before opening a PR
- When AI tools were used, check the AI disclosure checkbox and uncomment the
Generated-byline in the PR template, per ASF Generative Tooling Guidance
- Adding or changing
@Publicor@PublicEvolvingannotations (user-facing API commitments) - New dependencies
- Large cross-module refactors
- Changes to serialization formats or checkpoint behavior
- Changes to hot paths that could impact performance (per-record processing, state access)
- Commit secrets, credentials, or tokens
- Push directly to
apache/flink-cdc; always work from your fork - Mix unrelated changes into one PR
- Use JUnit 4 or Hamcrest in new test code
- Use
org.junit.jupiter.api.Assertions; use AssertJ instead - Add
Co-Authored-Bywith an AI agent in commit messages; useGenerated-by: <Tool Name and Version>instead