Skip to content

Add comprehensive migration documentation section#3341

Open
dataroaring wants to merge 12 commits intomasterfrom
feature/migration-documentation-consolidation
Open

Add comprehensive migration documentation section#3341
dataroaring wants to merge 12 commits intomasterfrom
feature/migration-documentation-consolidation

Conversation

@dataroaring
Copy link
Contributor

Summary

  • Create consolidated migration documentation section with guides for migrating from PostgreSQL, MySQL, Elasticsearch, and other OLAP systems to Apache Doris
  • Add migration overview page with comparison table of migration paths and methods
  • Include Chinese translations for all migration guides

Test plan

  • Verify all internal documentation links resolve correctly
  • Check sidebar navigation displays Migration section after Getting Started
  • Review English documentation renders properly
  • Review Chinese documentation renders properly

🤖 Generated with Claude Code

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive migration documentation to help users migrate data from various databases and data systems to Apache Doris. The documentation covers four major migration sources (PostgreSQL, MySQL, Elasticsearch, and other OLAP systems) with an overview page that provides a comparison table of migration paths and methods.

Changes:

  • Adds a new "Migration" section to the documentation sidebar, positioned between "Getting Started" and "Guides"
  • Creates 5 new English documentation files with detailed migration guides
  • Provides complete Chinese translations for all migration guides
  • Includes practical examples, data type mappings, and best practices for each migration source

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sidebars.ts Adds Migration category to sidebar with 5 migration guide entries
docs/migration/overview.md Overview page with migration path comparison table and method selection guide
docs/migration/postgresql-to-doris.md Complete PostgreSQL migration guide with JDBC Catalog, Flink CDC, and export-import options
docs/migration/mysql-to-doris.md MySQL migration guide emphasizing Flink CDC with full database sync capabilities
docs/migration/elasticsearch-to-doris.md Elasticsearch migration guide covering ES Catalog, Logstash, and custom scripts
docs/migration/other-olap-to-doris.md Migration guides for ClickHouse, Greenplum, Hive, Iceberg, Hudi, and Spark/Flink connectors
i18n/zh-CN/.../migration/*.md Chinese translations of all 5 migration guides

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 147 to 152
`★ Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`

Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Insight" section present in the English version (lines 147-151) is missing from the Chinese translation. This creates an inconsistency between the English and Chinese documentation. Either add a corresponding section to the Chinese version or remove it from the English version for consistency. If keeping it, consider using standard Docusaurus admonition syntax for better rendering.

Suggested change
`★ Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`

Copilot uses AI. Check for mistakes.

| Source System | Recommended Method | Notes |
|---------------|-------------------|-------|
| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed |
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "Convertor" should be spelled "Converter" for correct English spelling. "Converter" is the standard spelling for a tool that converts something from one form to another.

Suggested change
| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed |
| ClickHouse | JDBC Catalog + SQL Converter | Schema and SQL syntax conversion needed |

Copilot uses AI. Check for mistakes.
Comment on lines 147 to 151
`★ Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Insight" section uses an unusual formatting style with inline code blocks (backticks) rather than a proper markdown callout or info box. This formatting is inconsistent with standard Docusaurus documentation practices and may not render as intended. Consider using a standard markdown blockquote (>) or a Docusaurus admonition (:::info, :::tip, etc.) instead for better visual consistency and rendering.

Suggested change
` Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`
:::info Insight
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
:::

Copilot uses AI. Check for mistakes.
Comment on lines 274 to 279
```sql
-- Use Flink Doris Connector's database sync feature
CREATE DATABASE IF NOT EXISTS sync_db;

-- FlinkCDC whole database sync configuration
-- See Flink Doris Connector documentation for complete setup
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PostgreSQL full database sync section (Step 3) provides only a placeholder comment directing users to see the Flink Doris Connector documentation, while the MySQL migration guide provides a complete working example with the CdcTools command. For consistency and completeness, consider adding a similar concrete example here showing the postgres-sync-database command, similar to the mysql-sync-database example in mysql-to-doris.md (lines 132-149).

Suggested change
```sql
-- Use Flink Doris Connector's database sync feature
CREATE DATABASE IF NOT EXISTS sync_db;
-- FlinkCDC whole database sync configuration
-- See Flink Doris Connector documentation for complete setup
```bash
# Use CdcTools to perform a full PostgreSQL database sync into Doris
# This is similar in spirit to the mysql-sync-database command
cdc-tools postgres-sync-database \
--pg-host pg-host \
--pg-port 5432 \
--pg-username pg_user \
--pg-password pg_password \
--pg-database source_db \
--include-tables "public.orders,public.customers" \
--doris-fe-nodes doris-fe:8030 \
--doris-username doris_user \
--doris-password doris_password \
--doris-database target_db \
--sink-label-prefix pg_full_sync

Copilot uses AI. Check for mistakes.
dataroaring and others added 8 commits February 11, 2026 12:30
Create consolidated migration guides covering:
- Overview page with migration path comparison table
- PostgreSQL to Doris (JDBC Catalog, Flink CDC, Export/Import)
- MySQL to Doris (Flink CDC, JDBC Catalog, DataX)
- Elasticsearch to Doris (ES Catalog, inverted index migration)
- Other OLAP systems (ClickHouse, Greenplum, Hive/Iceberg/Hudi)

Each guide includes data type mappings, step-by-step instructions,
and troubleshooting for common issues. Chinese translations included.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Elasticsearch nested type should map to Doris VARIANT type for better
flexible schema handling. Added links to VARIANT documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change JSON type mappings to VARIANT across all migration docs:
- PostgreSQL: json/jsonb → VARIANT
- MySQL: JSON → VARIANT
- Elasticsearch: object, flattened → VARIANT

VARIANT type provides better flexible schema support for semi-structured
data migration. Added links to VARIANT documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Strip step-by-step code examples from all migration docs (EN + ZH-CN)
since their accuracy is unverified. Retain considerations/principles,
data type mapping tables, reference tables (SQL conversion, DSL-to-SQL,
table engine mapping), brief migration option descriptions with links
to official Doris docs, and validation checklists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace "Recommended" tags with descriptive labels that explain what
each option is suited for, letting users decide based on their needs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add VARIANT vs Dynamic Mapping and search() vs query_string comparison
tables to ES migration docs (EN + ZH-CN), sourced from feature
comparison document.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Doris-native Streaming Job (CREATE JOB ON STREAMING) as a
migration option for both MySQL and PostgreSQL docs (EN + ZH-CN).
This provides continuous file-based loading from S3/object storage
without external tools like Flink.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the file-based Streaming Job description with the actual
built-in CDC sync feature (FROM MYSQL/POSTGRES syntax). This uses
Flink CDC under the hood to read binlog/WAL directly, with auto
table creation and full + incremental sync in a single SQL command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dataroaring dataroaring force-pushed the feature/migration-documentation-consolidation branch from 6a2363f to 6329d18 Compare February 11, 2026 20:30
Add built-in Streaming Job as a migration method in the overview page:
- Update migration paths table to list Streaming Job for MySQL/PostgreSQL
- Add dedicated "Streaming Job" section explaining built-in CDC sync
- Clarify when to choose Streaming Job vs Flink CDC

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dataroaring and others added 3 commits February 11, 2026 12:39
Replace confusing Real-time Sync / Full Migration / Incremental columns
with a single "Sync Modes" column using clear values: Full, CDC, and
Batch Incremental.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use consistent order: Streaming Job / JDBC Catalog / Flink CDC
for both source systems.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "Migration Window" section was not actionable — downtime planning
is already implied by the choice of CDC vs batch migration method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant