Skip to content

Commit 3b6244f

Browse files
committed
docs: improve common issues section with specific error examples
1 parent ac3f9ef commit 3b6244f

File tree

1 file changed

+32
-24
lines changed

1 file changed

+32
-24
lines changed

README.md

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ cbcopy is an efficient database migration tool designed to transfer data and met
1010
The metadata migration feature of cbcopy is based on [gpbackup](https://github.com/greenplum-db/gpbackup-archive). Compared to GPDB's built-in `pg_dump`, cbcopy's main advantage is its ability to retrieve metadata in batches. While `pg_dump` fetches metadata one row or a few rows at a time, cbcopy retrieves it in batches. This batch processing approach significantly enhances performance, especially when handling large volumes of metadata, making it much faster than `pg_dump`.
1111

1212
### Data migration
13-
Both GPDB and CBDB support starting programs via SQL commands, and cbcopy utilizes this feature. During data migration, it uses SQL commands to start a program on the target database to receive and load data, while simultaneously using SQL commands to start a program on the source database to unload data and send it to the program on the target database.
13+
Both GPDB and CBDB support starting programs via SQL commands, and cbcopy utilizes this feature. During data migration, it uses SQL commands to start a program on the destination database to receive and load data, while simultaneously using SQL commands to start a program on the source database to unload data and send it to the program on the destination database.
1414

1515
## Pre-Requisites
1616

@@ -91,7 +91,7 @@ This will:
9191

9292
## Migrating Data with cbcopy
9393

94-
Before migrating data, you need to copy cbcopy_helper to the `$GPHOME/bin` directory on all nodes of both the source and target databases. Then you need to find a host that can connect to both the source database and the target database, and use the cbcopy command on that host to initiate the migration. Note that database superuser privileges are required for both source and target databases to perform the migration.
94+
Before migrating data, you need to copy cbcopy_helper to the `$GPHOME/bin` directory on all nodes of both the source and destination databases. Then you need to find a host that can connect to both the source database and the destination database, and use the cbcopy command on that host to initiate the migration. Note that database superuser privileges are required for both source and destination databases to perform the migration.
9595

9696
By default, both metadata and data are migrated. You can use `--metadata-only` to migrate only metadata, or `--data-only` to migrate only data. Based on our best practices, we recommend migrating metadata first using `--metadata-only`, and then migrating data using `--data-only`. This two-step approach helps ensure a more controlled and reliable migration process.
9797

@@ -110,14 +110,22 @@ cbcopy relies on the "COPY ON SEGMENT" command of the database, so it has specif
110110

111111
**Common Issue**: Many users encounter connection failures when using hostname for `--dest-host` because the hostname cannot be resolved from the source cluster nodes.
112112

113-
**Problem**: When you specify a hostname (e.g., `--dest-host=dest-warehouse-cluster`) instead of an IP address, all nodes in the source cluster must be able to resolve this hostname to the correct IP address. If the hostname resolution fails on any source cluster node, the migration will fail with connection errors.
113+
**Problem**: When you specify a hostname (e.g., `--dest-host=dest-warehouse-cluster`) instead of an IP address, all nodes in the source cluster must be able to resolve this hostname to the correct IP address. If the hostname resolution fails on any source cluster node, the migration will fail with errors such as `could not write to copy program: Broken pipe` or `extra data after last expected column`, which can be triggered by network issues.
114114

115115
#### `cbcopy_helper` Not Deployed
116116

117117
**Common Issue**: A common oversight is forgetting to copy the `cbcopy_helper` binary to all nodes in both the source and destination clusters. This can lead to connection errors that may appear to be DNS or network-related issues.
118118

119119
**Problem**: The `cbcopy` utility relies on the `cbcopy_helper` executable being present on every node of both the source and destination clusters to facilitate data transfer. If the helper is missing on any node, `cbcopy` may fail with error messages, such as being unable to resolve hostnames or establish connections, because the necessary communication channel cannot be opened.
120120

121+
#### Segment-to-Segment Network Connectivity
122+
123+
**Common Issue**: The masters of the source and destination clusters can communicate via TCP, but the segments cannot connect to each other due to firewall restrictions.
124+
125+
**Problem**: If you don't configure your firewall to allow TCP connections between segments of both clusters, you will likely encounter a situation where some tables (with small data volumes) migrate successfully while others (with large data volumes) fail.
126+
127+
This happens because small tables are typically processed by the masters (copy on master), while large tables are distributed across segments for parallel processing (copy on segment). When segments cannot reach each other, the migration fails with the same error messages as network issues: `could not write to copy program: Broken pipe` or `extra data after last expected column`. This mixed success/failure pattern is a strong indicator of segment-to-segment connectivity problems.
128+
121129
### Connection Modes
122130

123131
cbcopy supports two connection modes to handle different network environments:
@@ -151,13 +159,13 @@ cbcopy --source-host=external-db --dest-host=k8s-warehouse-cluster \
151159

152160
cbcopy supports seven migration modes.
153161

154-
- `--full` - Migrate all metadata and data from the source database to the target database.
155-
- `--dbname` - Migrate a specific database or multiple databases from the source to the target database.
156-
- `--schema` - Migrate a specific schema or multiple schemas from the source database to the target database.
157-
- `--schema-mapping-file` - Migrate specific schemas specified in a file from the source database to the target database.
158-
- `--include-table` - Migrate specific tables or multiple tables from the source database to the target database.
159-
- `--include-table-file` - Migrate specific tables specified in a file from the source database to the target database.
160-
- `--global-metadata-only` - Migrate global objects from the source database to the target database.
162+
- `--full` - Migrate all metadata and data from the source database to the destination database.
163+
- `--dbname` - Migrate a specific database or multiple databases from the source to the destination database.
164+
- `--schema` - Migrate a specific schema or multiple schemas from the source database to the destination database.
165+
- `--schema-mapping-file` - Migrate specific schemas specified in a file from the source database to the destination database.
166+
- `--include-table` - Migrate specific tables or multiple tables from the source database to the destination database.
167+
- `--include-table-file` - Migrate specific tables specified in a file from the source database to the destination database.
168+
- `--global-metadata-only` - Migrate global objects from the source database to the destination database.
161169

162170
### Data Loading Modes
163171
cbcopy supports two data loading modes.
@@ -169,20 +177,20 @@ cbcopy supports two data loading modes.
169177

170178
If the tables you are migrating depend on certain global objects (such as tablespaces), there are two ways to handle this:
171179

172-
1. Include the `--with-global-metadata` option (default: false) during migration, which will automatically create these global objects in the target database.
180+
1. Include the `--with-global-metadata` option (default: false) during migration, which will automatically create these global objects in the destination database.
173181

174-
2. If you choose not to use `--with-global-metadata`, you must manually create these global objects in the target database before running the migration. For example:
182+
2. If you choose not to use `--with-global-metadata`, you must manually create these global objects in the destination database before running the migration. For example:
175183
```sql
176184
-- If your tables use custom tablespaces, create them first:
177185
CREATE TABLESPACE custom_tablespace LOCATION '/path/to/tablespace';
178186
```
179187

180-
If neither option is taken, the creation of dependent tables in the target database will fail with errors like "tablespace 'custom_tablespace' does not exist".
188+
If neither option is taken, the creation of dependent tables in the destination database will fail with errors like "tablespace 'custom_tablespace' does not exist".
181189

182190
### Role
183-
If you want to change the ownership of the tables during migration without creating identical roles in the target database (by disabling the `--with-global-metadata` option), you need to:
191+
If you want to change the ownership of the tables during migration without creating identical roles in the destination database (by disabling the `--with-global-metadata` option), you need to:
184192

185-
1. First create the target roles in the target database
193+
1. First create the target roles in the destination database
186194
2. Use the `--owner-mapping-file` to specify the mapping between source and target roles
187195

188196
For example, if you have a mapping file with:
@@ -196,19 +204,19 @@ The migration process will execute statements like:
196204
ALTER TABLE table_name OWNER TO target_role1;
197205
```
198206

199-
If the target role doesn't exist in the target database, these ownership change statements will fail with an error like "role 'target_role1' does not exist".
207+
If the target role doesn't exist in the destination database, these ownership change statements will fail with an error like "role 'target_role1' does not exist".
200208

201209
### Tablespace
202210
cbcopy provides three ways to handle tablespace migration:
203211

204-
1. **Default Mode** - When no tablespace options are specified, objects will be created in the same tablespace names as they were in the source database. You have two options to ensure the tablespaces exist in the target database:
212+
1. **Default Mode** - When no tablespace options are specified, objects will be created in the same tablespace names as they were in the source database. You have two options to ensure the tablespaces exist in the destination database:
205213
- Use `--with-global-metadata` to automatically create matching tablespaces
206-
- Manually create the tablespaces in the target database before migration:
214+
- Manually create the tablespaces in the destination database before migration:
207215
```sql
208216
CREATE TABLESPACE custom_space LOCATION '/path/to/tablespace';
209217
```
210218

211-
2. **Single Target Tablespace** (`--dest-tablespace`) - Migrate all source database objects into a single specified tablespace on the target database, regardless of their original tablespace locations. For example:
219+
2. **Single destination Tablespace** (`--dest-tablespace`) - Migrate all source database objects into a single specified tablespace on the destination database, regardless of their original tablespace locations. For example:
212220
```bash
213221
cbcopy --dest-tablespace=new_space ...
214222
```
@@ -220,7 +228,7 @@ cbcopy provides three ways to handle tablespace migration:
220228
```
221229
222230
Note:
223-
- For the default mode, either use `--with-global-metadata` or ensure all required tablespaces exist in the target database before migration
231+
- For the default mode, either use `--with-global-metadata` or ensure all required tablespaces exist in the destination database before migration
224232
- If you need to migrate objects from different schemas into different tablespaces, you can either:
225233
1. Use `--tablespace-mapping-file` to specify all mappings at once
226234
2. Migrate one schema at a time using `--dest-tablespace` with different target tablespaces
@@ -230,15 +238,15 @@ Note:
230238
- `--copy-jobs` - The maximum number of tables that concurrently copies.
231239
232240
### Validate Migration
233-
During migration, we will compare the number of rows returned by `COPY TO` from the source database (i.e., the number of records coming out of the source database) with the number of rows returned by `COPY FROM` in the target database (i.e., the number of records loaded in the target database). If the two counts do not match, the migration of that table will fail.
241+
During migration, we will compare the number of rows returned by `COPY TO` from the source database (i.e., the number of records coming out of the source database) with the number of rows returned by `COPY FROM` in the destination database (i.e., the number of records loaded in the destination database). If the two counts do not match, the migration of that table will fail.
234242
235243
### Copy Strategies
236244
237245
cbcopy internally supports three copy strategies for tables.
238246
239-
- `Copy On Coordinator` - If the table's statistics `pg_class->reltuples` is less than `--on-segment-threshold`, cbcopy will enable the `Copy On Coordinator` strategy for this table, meaning that data migration between the source and target databases can only occur through the coordinator node.
240-
- `Copy On Segment` - If the table's statistics `pg_class->reltuples` is greater than `--on-segment-threshold`, and both the source and target databases have the same version and the same number of nodes, cbcopy will enable the `Copy On Segment` strategy for this table. This means that data migration between the source and target databases will occur in parallel across all segment nodes without data redistribution.
241-
- `Copy on External Table` - For tables that do not meet the conditions for the above two strategies, cbcopy will enable the `Copy On External Table` strategy. This means that data migration between the source and target databases will occur in parallel across all segment nodes with data redistribution.
247+
- `Copy On Coordinator` - If the table's statistics `pg_class->reltuples` is less than `--on-segment-threshold`, cbcopy will enable the `Copy On Coordinator` strategy for this table, meaning that data migration between the source and destination databases can only occur through the coordinator node.
248+
- `Copy On Segment` - If the table's statistics `pg_class->reltuples` is greater than `--on-segment-threshold`, and both the source and target databases have the same version and the same number of nodes, cbcopy will enable the `Copy On Segment` strategy for this table. This means that data migration between the source and destination databases will occur in parallel across all segment nodes without data redistribution.
249+
- `Copy on External Table` - For tables that do not meet the conditions for the above two strategies, cbcopy will enable the `Copy On External Table` strategy. This means that data migration between the source and destination databases will occur in parallel across all segment nodes with data redistribution.
242250
243251
### Log Files and Migration Results
244252

0 commit comments

Comments
 (0)