Add comprehensive migration documentation section by dataroaring · Pull Request #3341 · apache/doris-website

dataroaring · 2026-02-05T23:40:25Z

Summary

Create consolidated migration documentation section with guides for migrating from PostgreSQL, MySQL, Elasticsearch, and other OLAP systems to Apache Doris
Add migration overview page with comparison table of migration paths and methods
Include Chinese translations for all migration guides

Test plan

Verify all internal documentation links resolve correctly
Check sidebar navigation displays Migration section after Getting Started
Review English documentation renders properly
Review Chinese documentation renders properly

🤖 Generated with Claude Code

Copilot

Pull request overview

This pull request adds comprehensive migration documentation to help users migrate data from various databases and data systems to Apache Doris. The documentation covers four major migration sources (PostgreSQL, MySQL, Elasticsearch, and other OLAP systems) with an overview page that provides a comparison table of migration paths and methods.

Changes:

Adds a new "Migration" section to the documentation sidebar, positioned between "Getting Started" and "Guides"
Creates 5 new English documentation files with detailed migration guides
Provides complete Chinese translations for all migration guides
Includes practical examples, data type mappings, and best practices for each migration source

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
sidebars.ts	Adds Migration category to sidebar with 5 migration guide entries
docs/migration/overview.md	Overview page with migration path comparison table and method selection guide
docs/migration/postgresql-to-doris.md	Complete PostgreSQL migration guide with JDBC Catalog, Flink CDC, and export-import options
docs/migration/mysql-to-doris.md	MySQL migration guide emphasizing Flink CDC with full database sync capabilities
docs/migration/elasticsearch-to-doris.md	Elasticsearch migration guide covering ES Catalog, Logstash, and custom scripts
docs/migration/other-olap-to-doris.md	Migration guides for ClickHouse, Greenplum, Hive, Iceberg, Hudi, and Spark/Flink connectors
i18n/zh-CN/.../migration/*.md	Chinese translations of all 5 migration guides

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T23:46:12Z

docs/migration/elasticsearch-to-doris.md

+`★ Insight ─────────────────────────────────────`
+1. **DUPLICATE KEY model** is best for log data where append-only writes are common
+2. **Inverted indexes** enable full-text search similar to Elasticsearch
+3. **Dynamic partitioning** automatically manages time-based data lifecycle
+`─────────────────────────────────────────────────`
+


The "Insight" section present in the English version (lines 147-151) is missing from the Chinese translation. This creates an inconsistency between the English and Chinese documentation. Either add a corresponding section to the Chinese version or remove it from the English version for consistency. If keeping it, consider using standard Docusaurus admonition syntax for better rendering.

Suggested change

`★ Insight ─────────────────────────────────────`

1. **DUPLICATE KEY model** is best for log data where append-only writes are common

2. **Inverted indexes** enable full-text search similar to Elasticsearch

3. **Dynamic partitioning** automatically manages time-based data lifecycle

`─────────────────────────────────────────────────`

Copilot · 2026-02-05T23:46:13Z

docs/migration/other-olap-to-doris.md

+
+| Source System | Recommended Method | Notes |
+|---------------|-------------------|-------|
+| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed |


The term "Convertor" should be spelled "Converter" for correct English spelling. "Converter" is the standard spelling for a tool that converts something from one form to another.

Suggested change

| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed |

| ClickHouse | JDBC Catalog + SQL Converter | Schema and SQL syntax conversion needed |

Copilot · 2026-02-05T23:46:13Z

docs/migration/elasticsearch-to-doris.md

+`★ Insight ─────────────────────────────────────`
+1. **DUPLICATE KEY model** is best for log data where append-only writes are common
+2. **Inverted indexes** enable full-text search similar to Elasticsearch
+3. **Dynamic partitioning** automatically manages time-based data lifecycle
+`─────────────────────────────────────────────────`


This "Insight" section uses an unusual formatting style with inline code blocks (backticks) rather than a proper markdown callout or info box. This formatting is inconsistent with standard Docusaurus documentation practices and may not render as intended. Consider using a standard markdown blockquote (>) or a Docusaurus admonition (:::info, :::tip, etc.) instead for better visual consistency and rendering.

Suggested change

`★ Insight ─────────────────────────────────────`

1. **DUPLICATE KEY model** is best for log data where append-only writes are common

2. **Inverted indexes** enable full-text search similar to Elasticsearch

3. **Dynamic partitioning** automatically manages time-based data lifecycle

`─────────────────────────────────────────────────`

:::info Insight

1. **DUPLICATE KEY model** is best for log data where append-only writes are common

2. **Inverted indexes** enable full-text search similar to Elasticsearch

3. **Dynamic partitioning** automatically manages time-based data lifecycle

:::

Copilot · 2026-02-05T23:46:14Z

docs/migration/postgresql-to-doris.md

+```sql
+-- Use Flink Doris Connector's database sync feature
+CREATE DATABASE IF NOT EXISTS sync_db;
+
+-- FlinkCDC whole database sync configuration
+-- See Flink Doris Connector documentation for complete setup


The PostgreSQL full database sync section (Step 3) provides only a placeholder comment directing users to see the Flink Doris Connector documentation, while the MySQL migration guide provides a complete working example with the CdcTools command. For consistency and completeness, consider adding a similar concrete example here showing the postgres-sync-database command, similar to the mysql-sync-database example in mysql-to-doris.md (lines 132-149).

Suggested change

```sql

-- Use Flink Doris Connector's database sync feature

CREATE DATABASE IF NOT EXISTS sync_db;

-- FlinkCDC whole database sync configuration

-- See Flink Doris Connector documentation for complete setup

```bash

# Use CdcTools to perform a full PostgreSQL database sync into Doris

# This is similar in spirit to the mysql-sync-database command

cdc-tools postgres-sync-database \

--pg-host pg-host \

--pg-port 5432 \

--pg-username pg_user \

--pg-password pg_password \

--pg-database source_db \

--include-tables "public.orders,public.customers" \

--doris-fe-nodes doris-fe:8030 \

--doris-username doris_user \

--doris-password doris_password \

--doris-database target_db \

--sink-label-prefix pg_full_sync

Create consolidated migration guides covering: - Overview page with migration path comparison table - PostgreSQL to Doris (JDBC Catalog, Flink CDC, Export/Import) - MySQL to Doris (Flink CDC, JDBC Catalog, DataX) - Elasticsearch to Doris (ES Catalog, inverted index migration) - Other OLAP systems (ClickHouse, Greenplum, Hive/Iceberg/Hudi) Each guide includes data type mappings, step-by-step instructions, and troubleshooting for common issues. Chinese translations included. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Elasticsearch nested type should map to Doris VARIANT type for better flexible schema handling. Added links to VARIANT documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Change JSON type mappings to VARIANT across all migration docs: - PostgreSQL: json/jsonb → VARIANT - MySQL: JSON → VARIANT - Elasticsearch: object, flattened → VARIANT VARIANT type provides better flexible schema support for semi-structured data migration. Added links to VARIANT documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Strip step-by-step code examples from all migration docs (EN + ZH-CN) since their accuracy is unverified. Retain considerations/principles, data type mapping tables, reference tables (SQL conversion, DSL-to-SQL, table engine mapping), brief migration option descriptions with links to official Doris docs, and validation checklists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace "Recommended" tags with descriptive labels that explain what each option is suited for, letting users decide based on their needs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add VARIANT vs Dynamic Mapping and search() vs query_string comparison tables to ES migration docs (EN + ZH-CN), sourced from feature comparison document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Doris-native Streaming Job (CREATE JOB ON STREAMING) as a migration option for both MySQL and PostgreSQL docs (EN + ZH-CN). This provides continuous file-based loading from S3/object storage without external tools like Flink. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the file-based Streaming Job description with the actual built-in CDC sync feature (FROM MYSQL/POSTGRES syntax). This uses Flink CDC under the hood to read binlog/WAL directly, with auto table creation and full + incremental sync in a single SQL command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add built-in Streaming Job as a migration method in the overview page: - Update migration paths table to list Streaming Job for MySQL/PostgreSQL - Add dedicated "Streaming Job" section explaining built-in CDC sync - Clarify when to choose Streaming Job vs Flink CDC Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace confusing Real-time Sync / Full Migration / Incremental columns with a single "Sync Modes" column using clear values: Full, CDC, and Batch Incremental. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use consistent order: Streaming Job / JDBC Catalog / Flink CDC for both source systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The "Migration Window" section was not actionable — downtime planning is already implied by the choice of CDC vs batch migration method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 5, 2026 23:40

dataroaring had a problem deploying to Production February 5, 2026 23:40 — with GitHub Actions Failure

Copilot started reviewing on behalf of dataroaring February 5, 2026 23:40 View session

dataroaring temporarily deployed to Production February 5, 2026 23:45 — with GitHub Actions Inactive

Copilot AI reviewed Feb 5, 2026

View reviewed changes

dataroaring temporarily deployed to Production February 6, 2026 00:32 — with GitHub Actions Inactive

dataroaring had a problem deploying to Production February 10, 2026 00:27 — with GitHub Actions Failure

dataroaring had a problem deploying to Production February 10, 2026 00:30 — with GitHub Actions Failure

dataroaring temporarily deployed to Production February 10, 2026 01:24 — with GitHub Actions Inactive

dataroaring had a problem deploying to Production February 10, 2026 02:14 — with GitHub Actions Failure

dataroaring requested a review from zclllyybb as a code owner February 11, 2026 20:22

dataroaring had a problem deploying to Production February 11, 2026 20:22 — with GitHub Actions Failure

dataroaring and others added 8 commits February 11, 2026 12:30

Update ES nested type mapping from JSON to VARIANT

8d6ba88

Elasticsearch nested type should map to Doris VARIANT type for better flexible schema handling. Added links to VARIANT documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove recommendation labels from migration options

779f059

Replace "Recommended" tags with descriptive labels that explain what each option is suited for, letting users decide based on their needs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ES feature compatibility tables (VARIANT, search function)

8cedfc3

Add VARIANT vs Dynamic Mapping and search() vs query_string comparison tables to ES migration docs (EN + ZH-CN), sourced from feature comparison document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dataroaring force-pushed the feature/migration-documentation-consolidation branch from 6a2363f to 6329d18 Compare February 11, 2026 20:30

dataroaring had a problem deploying to Production February 11, 2026 20:30 — with GitHub Actions Failure

dataroaring temporarily deployed to Production February 11, 2026 20:37 — with GitHub Actions Inactive

dataroaring and others added 3 commits February 11, 2026 12:39

Simplify migration paths table to single Sync Modes column

f84c917

Replace confusing Real-time Sync / Full Migration / Incremental columns with a single "Sync Modes" column using clear values: Full, CDC, and Batch Incremental. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Align migration method order for PostgreSQL and MySQL

3f4a558

Use consistent order: Streaming Job / JDBC Catalog / Flink CDC for both source systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove vague Migration Window planning principle

211a7fe

The "Migration Window" section was not actionable — downtime planning is already implied by the choice of CDC vs batch migration method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dataroaring temporarily deployed to Production February 11, 2026 21:20 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comprehensive migration documentation section#3341

Add comprehensive migration documentation section#3341
dataroaring wants to merge 12 commits intomasterfrom
feature/migration-documentation-consolidation

dataroaring commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	\| ClickHouse \| JDBC Catalog + SQL Convertor \| Schema and SQL syntax conversion needed \|
	\| ClickHouse \| JDBC Catalog + SQL Converter \| Schema and SQL syntax conversion needed \|

-```sql
--- Use Flink Doris Connector's database sync feature
-CREATE DATABASE IF NOT EXISTS sync_db;
--- FlinkCDC whole database sync configuration
--- See Flink Doris Connector documentation for complete setup
+```bash
+# Use CdcTools to perform a full PostgreSQL database sync into Doris
+# This is similar in spirit to the mysql-sync-database command
+cdc-tools postgres-sync-database \
+  --pg-host pg-host \
+  --pg-port 5432 \
+  --pg-username pg_user \
+  --pg-password pg_password \
+  --pg-database source_db \
+  --include-tables "public.orders,public.customers" \
+  --doris-fe-nodes doris-fe:8030 \
+  --doris-username doris_user \
+  --doris-password doris_password \
+  --doris-database target_db \
+  --sink-label-prefix pg_full_sync

Conversation

dataroaring commented Feb 5, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant