Add built-in Streaming Job option for MySQL and PostgreSQL migration

dataroaring · claude · dataroaring · commit 6a2363f3e9b2 · 2026-02-11T12:22:12.000-08:00
Replace the file-based Streaming Job description with the actual
built-in CDC sync feature (FROM MYSQL/POSTGRES syntax). This uses
Flink CDC under the hood to read binlog/WAL directly, with auto
table creation and full + incremental sync in a single SQL command.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/docs/migration/mysql-to-doris.md b/docs/migration/mysql-to-doris.md
@@ -70,19 +70,74 @@ For detailed setup, see the [Flink Doris Connector](../ecosystem/flink-doris-con
 
 The [JDBC Catalog](../lakehouse/catalogs/jdbc-catalog.md) allows direct querying and batch migration from MySQL. This is the simplest approach for one-time or periodic batch migrations.
 
-### Option 3: Streaming Job (Continuous File Loading)
+### Option 3: Streaming Job (Built-in CDC Sync)
 
-Doris's built-in [Streaming Job](../data-operate/import/streaming-job.md) (`CREATE JOB ON STREAMING`) provides continuous file-based loading without external tools. Export MySQL data to S3/object storage, and the Streaming Job automatically picks up new files and loads them into Doris.
+Doris's built-in [Streaming Job](../data-operate/import/streaming-job/streaming-job-multi-table.md) can directly synchronize full and incremental data from MySQL to Doris without external tools like Flink. It uses CDC under the hood to read MySQL binlog and automatically creates target tables (UNIQUE KEY model) with primary keys matching the source.
 
 This option is suited for:
 
-- Continuous incremental migration via file export pipelines
-- Environments where you prefer Doris-native features over external tools like Flink
-- Scenarios where MySQL data is periodically exported to object storage
-
-**Prerequisites**: Data exported to S3-compatible object storage; Doris 2.1+ with Job Scheduler enabled.
-
-For detailed setup, see the [Streaming Job](../data-operate/import/streaming-job.md) and [CREATE STREAMING JOB](../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) documentation.
+- Real-time multi-table sync without deploying a Flink cluster
+- Environments where you prefer Doris-native features over external tools
+- Full + incremental migration with a single SQL command
+
+**Prerequisites**: MySQL with binlog enabled (`binlog_format = ROW`); MySQL JDBC driver deployed to Doris.
+
+#### Step 1: Enable MySQL Binlog
+
+Ensure `my.cnf` contains:
+
+```ini
+[mysqld]
+log-bin = mysql-bin
+binlog_format = ROW
+server-id = 1
+```
+
+#### Step 2: Create Streaming Job
+
+```sql
+CREATE JOB mysql_sync
+ON STREAMING
+FROM MYSQL (
+    "jdbc_url" = "jdbc:mysql://mysql-host:3306",
+    "driver_url" = "mysql-connector-j-8.0.31.jar",
+    "driver_class" = "com.mysql.cj.jdbc.Driver",
+    "user" = "root",
+    "password" = "password",
+    "database" = "source_db",
+    "include_tables" = "orders,customers,products",
+    "offset" = "initial"
+)
+TO DATABASE target_db (
+    "table.create.properties.replication_num" = "3"
+)
+```
+
+Key parameters:
+
+| Parameter | Description |
+|-----------|-------------|
+| `include_tables` | Comma-separated list of tables to sync |
+| `offset` | `initial` for full + incremental; `latest` for incremental only |
+| `snapshot_split_size` | Row count per split during full sync (default: 8096) |
+| `snapshot_parallelism` | Parallelism during full sync phase (default: 1) |
+
+#### Step 3: Monitor Sync Status
+
+```sql
+-- Check job status
+SELECT * FROM jobs(type=insert) WHERE ExecuteType = "STREAMING";
+
+-- Check task history
+SELECT * FROM tasks(type='insert') WHERE jobName = 'mysql_sync';
+
+-- Pause / Resume / Drop
+PAUSE JOB WHERE jobname = 'mysql_sync';
+RESUME JOB WHERE jobname = 'mysql_sync';
+DROP JOB WHERE jobname = 'mysql_sync';
+```
+
+For detailed reference, see the [Streaming Job Multi-Table Sync](../data-operate/import/streaming-job/streaming-job-multi-table.md) documentation.
 
 ### Option 4: DataX
 
diff --git a/docs/migration/postgresql-to-doris.md b/docs/migration/postgresql-to-doris.md
@@ -68,19 +68,72 @@ Flink CDC captures changes from PostgreSQL WAL (Write-Ahead Log) and streams the
 
 For detailed setup, see the [Flink Doris Connector](../ecosystem/flink-doris-connector.md) documentation.
 
-### Option 3: Streaming Job (Continuous File Loading)
+### Option 3: Streaming Job (Built-in CDC Sync)
 
-Doris's built-in [Streaming Job](../data-operate/import/streaming-job.md) (`CREATE JOB ON STREAMING`) provides continuous file-based loading without external tools. Export PostgreSQL data to S3/object storage, and the Streaming Job automatically picks up new files and loads them into Doris.
+Doris's built-in [Streaming Job](../data-operate/import/streaming-job/streaming-job-multi-table.md) can directly synchronize full and incremental data from PostgreSQL to Doris without external tools like Flink. It uses CDC under the hood to read PostgreSQL WAL and automatically creates target tables (UNIQUE KEY model) with primary keys matching the source.
 
 This option is suited for:
 
-- Continuous incremental migration via file export pipelines
-- Environments where you prefer Doris-native features over external tools like Flink
-- Scenarios where PostgreSQL data is periodically exported to object storage
-
-**Prerequisites**: Data exported to S3-compatible object storage; Doris 2.1+ with Job Scheduler enabled.
-
-For detailed setup, see the [Streaming Job](../data-operate/import/streaming-job.md) and [CREATE STREAMING JOB](../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) documentation.
+- Real-time multi-table sync without deploying a Flink cluster
+- Environments where you prefer Doris-native features over external tools
+- Full + incremental migration with a single SQL command
+
+**Prerequisites**: PostgreSQL with logical replication enabled (`wal_level = logical`); PostgreSQL JDBC driver deployed to Doris.
+
+#### Step 1: Enable Logical Replication
+
+Ensure `postgresql.conf` contains:
+
+```ini
+wal_level = logical
+```
+
+#### Step 2: Create Streaming Job
+
+```sql
+CREATE JOB pg_sync
+ON STREAMING
+FROM POSTGRES (
+    "jdbc_url" = "jdbc:postgresql://pg-host:5432/source_db",
+    "driver_url" = "postgresql-42.5.6.jar",
+    "driver_class" = "org.postgresql.Driver",
+    "user" = "postgres",
+    "password" = "password",
+    "database" = "source_db",
+    "schema" = "public",
+    "include_tables" = "orders,customers,products",
+    "offset" = "initial"
+)
+TO DATABASE target_db (
+    "table.create.properties.replication_num" = "3"
+)
+```
+
+Key parameters:
+
+| Parameter | Description |
+|-----------|-------------|
+| `include_tables` | Comma-separated list of tables to sync |
+| `offset` | `initial` for full + incremental; `latest` for incremental only |
+| `snapshot_split_size` | Row count per split during full sync (default: 8096) |
+| `snapshot_parallelism` | Parallelism during full sync phase (default: 1) |
+
+#### Step 3: Monitor Sync Status
+
+```sql
+-- Check job status
+SELECT * FROM jobs(type=insert) WHERE ExecuteType = "STREAMING";
+
+-- Check task history
+SELECT * FROM tasks(type='insert') WHERE jobName = 'pg_sync';
+
+-- Pause / Resume / Drop
+PAUSE JOB WHERE jobname = 'pg_sync';
+RESUME JOB WHERE jobname = 'pg_sync';
+DROP JOB WHERE jobname = 'pg_sync';
+```
+
+For detailed reference, see the [Streaming Job Multi-Table Sync](../data-operate/import/streaming-job/streaming-job-multi-table.md) documentation.
 
 ### Option 4: Export and Load
 
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/migration/mysql-to-doris.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/migration/mysql-to-doris.md
@@ -70,19 +70,74 @@ Flink CDC 捕获 MySQL binlog 变更并流式传输到 Doris。此方法适用
 
 [JDBC Catalog](../lakehouse/catalogs/jdbc-catalog.md) 允许从 MySQL 直接查询和批量迁移。这是一次性或定期批量迁移最简单的方法。
 
-### 选项 3：Streaming Job（持续文件加载）
+### 选项 3：Streaming Job（内置 CDC 同步）
 
-Doris 内置的 [Streaming Job](../data-operate/import/streaming-job.md)（`CREATE JOB ON STREAMING`）提供无需外部工具的持续文件加载能力。将 MySQL 数据导出到 S3/对象存储，Streaming Job 会自动发现新文件并加载到 Doris。
+Doris 内置的 [Streaming Job](../data-operate/import/streaming-job/streaming-job-multi-table.md) 可以直接从 MySQL 同步全量和增量数据到 Doris，无需部署 Flink 等外部工具。底层使用 CDC 读取 MySQL binlog，并自动创建目标表（UNIQUE KEY 模型），主键与源表保持一致。
 
 此选项适用于：
 
-- 通过文件导出管道进行持续增量迁移
-- 偏好使用 Doris 原生功能而非 Flink 等外部工具的环境
-- MySQL 数据定期导出到对象存储的场景
-
-**前提条件**：数据已导出到 S3 兼容的对象存储；Doris 2.1+ 并启用 Job Scheduler。
-
-详细设置请参考 [Streaming Job](../data-operate/import/streaming-job.md) 和 [CREATE STREAMING JOB](../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) 文档。
+- 无需部署 Flink 集群的实时多表同步
+- 偏好使用 Doris 原生功能而非外部工具的环境
+- 通过单条 SQL 命令实现全量 + 增量迁移
+
+**前提条件**：MySQL 启用 binlog（`binlog_format = ROW`）；MySQL JDBC 驱动已部署到 Doris。
+
+#### 步骤 1：启用 MySQL Binlog
+
+确保 `my.cnf` 包含：
+
+```ini
+[mysqld]
+log-bin = mysql-bin
+binlog_format = ROW
+server-id = 1
+```
+
+#### 步骤 2：创建 Streaming Job
+
+```sql
+CREATE JOB mysql_sync
+ON STREAMING
+FROM MYSQL (
+    "jdbc_url" = "jdbc:mysql://mysql-host:3306",
+    "driver_url" = "mysql-connector-j-8.0.31.jar",
+    "driver_class" = "com.mysql.cj.jdbc.Driver",
+    "user" = "root",
+    "password" = "password",
+    "database" = "source_db",
+    "include_tables" = "orders,customers,products",
+    "offset" = "initial"
+)
+TO DATABASE target_db (
+    "table.create.properties.replication_num" = "3"
+)
+```
+
+关键参数：
+
+| 参数 | 说明 |
+|------|------|
+| `include_tables` | 逗号分隔的待同步表列表 |
+| `offset` | `initial` 全量 + 增量；`latest` 仅增量 |
+| `snapshot_split_size` | 全量同步时每个分片的行数（默认：8096） |
+| `snapshot_parallelism` | 全量同步阶段的并行度（默认：1） |
+
+#### 步骤 3：监控同步状态
+
+```sql
+-- 查看 Job 状态
+SELECT * FROM jobs(type=insert) WHERE ExecuteType = "STREAMING";
+
+-- 查看 Task 历史
+SELECT * FROM tasks(type='insert') WHERE jobName = 'mysql_sync';
+
+-- 暂停 / 恢复 / 删除
+PAUSE JOB WHERE jobname = 'mysql_sync';
+RESUME JOB WHERE jobname = 'mysql_sync';
+DROP JOB WHERE jobname = 'mysql_sync';
+```
+
+详细参考请见 [Streaming Job 多表同步](../data-operate/import/streaming-job/streaming-job-multi-table.md) 文档。
 
 ### 选项 4：DataX
 
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/migration/postgresql-to-doris.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/migration/postgresql-to-doris.md
@@ -68,19 +68,72 @@ Flink CDC 从 PostgreSQL WAL（预写日志）捕获变更并实时流式传输
 
 详细设置请参考 [Flink Doris Connector](../ecosystem/flink-doris-connector.md) 文档。
 
-### 选项 3：Streaming Job（持续文件加载）
+### 选项 3：Streaming Job（内置 CDC 同步）
 
-Doris 内置的 [Streaming Job](../data-operate/import/streaming-job.md)（`CREATE JOB ON STREAMING`）提供无需外部工具的持续文件加载能力。将 PostgreSQL 数据导出到 S3/对象存储，Streaming Job 会自动发现新文件并加载到 Doris。
+Doris 内置的 [Streaming Job](../data-operate/import/streaming-job/streaming-job-multi-table.md) 可以直接从 PostgreSQL 同步全量和增量数据到 Doris，无需部署 Flink 等外部工具。底层使用 CDC 读取 PostgreSQL WAL，并自动创建目标表（UNIQUE KEY 模型），主键与源表保持一致。
 
 此选项适用于：
 
-- 通过文件导出管道进行持续增量迁移
-- 偏好使用 Doris 原生功能而非 Flink 等外部工具的环境
-- PostgreSQL 数据定期导出到对象存储的场景
-
-**前提条件**：数据已导出到 S3 兼容的对象存储；Doris 2.1+ 并启用 Job Scheduler。
-
-详细设置请参考 [Streaming Job](../data-operate/import/streaming-job.md) 和 [CREATE STREAMING JOB](../sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md) 文档。
+- 无需部署 Flink 集群的实时多表同步
+- 偏好使用 Doris 原生功能而非外部工具的环境
+- 通过单条 SQL 命令实现全量 + 增量迁移
+
+**前提条件**：PostgreSQL 启用逻辑复制（`wal_level = logical`）；PostgreSQL JDBC 驱动已部署到 Doris。
+
+#### 步骤 1：启用逻辑复制
+
+确保 `postgresql.conf` 包含：
+
+```ini
+wal_level = logical
+```
+
+#### 步骤 2：创建 Streaming Job
+
+```sql
+CREATE JOB pg_sync
+ON STREAMING
+FROM POSTGRES (
+    "jdbc_url" = "jdbc:postgresql://pg-host:5432/source_db",
+    "driver_url" = "postgresql-42.5.6.jar",
+    "driver_class" = "org.postgresql.Driver",
+    "user" = "postgres",
+    "password" = "password",
+    "database" = "source_db",
+    "schema" = "public",
+    "include_tables" = "orders,customers,products",
+    "offset" = "initial"
+)
+TO DATABASE target_db (
+    "table.create.properties.replication_num" = "3"
+)
+```
+
+关键参数：
+
+| 参数 | 说明 |
+|------|------|
+| `include_tables` | 逗号分隔的待同步表列表 |
+| `offset` | `initial` 全量 + 增量；`latest` 仅增量 |
+| `snapshot_split_size` | 全量同步时每个分片的行数（默认：8096） |
+| `snapshot_parallelism` | 全量同步阶段的并行度（默认：1） |
+
+#### 步骤 3：监控同步状态
+
+```sql
+-- 查看 Job 状态
+SELECT * FROM jobs(type=insert) WHERE ExecuteType = "STREAMING";
+
+-- 查看 Task 历史
+SELECT * FROM tasks(type='insert') WHERE jobName = 'pg_sync';
+
+-- 暂停 / 恢复 / 删除
+PAUSE JOB WHERE jobname = 'pg_sync';
+RESUME JOB WHERE jobname = 'pg_sync';
+DROP JOB WHERE jobname = 'pg_sync';
+```
+
+详细参考请见 [Streaming Job 多表同步](../data-operate/import/streaming-job/streaming-job-multi-table.md) 文档。
 
 ### 选项 4：导出和加载