You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+32-24Lines changed: 32 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ cbcopy is an efficient database migration tool designed to transfer data and met
10
10
The metadata migration feature of cbcopy is based on [gpbackup](https://github.com/greenplum-db/gpbackup-archive). Compared to GPDB's built-in `pg_dump`, cbcopy's main advantage is its ability to retrieve metadata in batches. While `pg_dump` fetches metadata one row or a few rows at a time, cbcopy retrieves it in batches. This batch processing approach significantly enhances performance, especially when handling large volumes of metadata, making it much faster than `pg_dump`.
11
11
12
12
### Data migration
13
-
Both GPDB and CBDB support starting programs via SQL commands, and cbcopy utilizes this feature. During data migration, it uses SQL commands to start a program on the target database to receive and load data, while simultaneously using SQL commands to start a program on the source database to unload data and send it to the program on the target database.
13
+
Both GPDB and CBDB support starting programs via SQL commands, and cbcopy utilizes this feature. During data migration, it uses SQL commands to start a program on the destination database to receive and load data, while simultaneously using SQL commands to start a program on the source database to unload data and send it to the program on the destination database.
14
14
15
15
## Pre-Requisites
16
16
@@ -91,7 +91,7 @@ This will:
91
91
92
92
## Migrating Data with cbcopy
93
93
94
-
Before migrating data, you need to copy cbcopy_helper to the `$GPHOME/bin` directory on all nodes of both the source and target databases. Then you need to find a host that can connect to both the source database and the target database, and use the cbcopy command on that host to initiate the migration. Note that database superuser privileges are required for both source and target databases to perform the migration.
94
+
Before migrating data, you need to copy cbcopy_helper to the `$GPHOME/bin` directory on all nodes of both the source and destination databases. Then you need to find a host that can connect to both the source database and the destination database, and use the cbcopy command on that host to initiate the migration. Note that database superuser privileges are required for both source and destination databases to perform the migration.
95
95
96
96
By default, both metadata and data are migrated. You can use `--metadata-only` to migrate only metadata, or `--data-only` to migrate only data. Based on our best practices, we recommend migrating metadata first using `--metadata-only`, and then migrating data using `--data-only`. This two-step approach helps ensure a more controlled and reliable migration process.
97
97
@@ -110,14 +110,22 @@ cbcopy relies on the "COPY ON SEGMENT" command of the database, so it has specif
110
110
111
111
**Common Issue**: Many users encounter connection failures when using hostname for `--dest-host` because the hostname cannot be resolved from the source cluster nodes.
112
112
113
-
**Problem**: When you specify a hostname (e.g., `--dest-host=dest-warehouse-cluster`) instead of an IP address, all nodes in the source cluster must be able to resolve this hostname to the correct IP address. If the hostname resolution fails on any source cluster node, the migration will fail with connection errors.
113
+
**Problem**: When you specify a hostname (e.g., `--dest-host=dest-warehouse-cluster`) instead of an IP address, all nodes in the source cluster must be able to resolve this hostname to the correct IP address. If the hostname resolution fails on any source cluster node, the migration will fail with errors such as `could not write to copy program: Broken pipe` or `extra data after last expected column`, which can be triggered by network issues.
114
114
115
115
#### `cbcopy_helper` Not Deployed
116
116
117
117
**Common Issue**: A common oversight is forgetting to copy the `cbcopy_helper` binary to all nodes in both the source and destination clusters. This can lead to connection errors that may appear to be DNS or network-related issues.
118
118
119
119
**Problem**: The `cbcopy` utility relies on the `cbcopy_helper` executable being present on every node of both the source and destination clusters to facilitate data transfer. If the helper is missing on any node, `cbcopy` may fail with error messages, such as being unable to resolve hostnames or establish connections, because the necessary communication channel cannot be opened.
120
120
121
+
#### Segment-to-Segment Network Connectivity
122
+
123
+
**Common Issue**: The masters of the source and destination clusters can communicate via TCP, but the segments cannot connect to each other due to firewall restrictions.
124
+
125
+
**Problem**: If you don't configure your firewall to allow TCP connections between segments of both clusters, you will likely encounter a situation where some tables (with small data volumes) migrate successfully while others (with large data volumes) fail.
126
+
127
+
This happens because small tables are typically processed by the masters (copy on master), while large tables are distributed across segments for parallel processing (copy on segment). When segments cannot reach each other, the migration fails with the same error messages as network issues: `could not write to copy program: Broken pipe` or `extra data after last expected column`. This mixed success/failure pattern is a strong indicator of segment-to-segment connectivity problems.
128
+
121
129
### Connection Modes
122
130
123
131
cbcopy supports two connection modes to handle different network environments:
-`--full` - Migrate all metadata and data from the source database to the target database.
155
-
-`--dbname` - Migrate a specific database or multiple databases from the source to the target database.
156
-
-`--schema` - Migrate a specific schema or multiple schemas from the source database to the target database.
157
-
-`--schema-mapping-file` - Migrate specific schemas specified in a file from the source database to the target database.
158
-
-`--include-table` - Migrate specific tables or multiple tables from the source database to the target database.
159
-
-`--include-table-file` - Migrate specific tables specified in a file from the source database to the target database.
160
-
-`--global-metadata-only` - Migrate global objects from the source database to the target database.
162
+
-`--full` - Migrate all metadata and data from the source database to the destination database.
163
+
-`--dbname` - Migrate a specific database or multiple databases from the source to the destination database.
164
+
-`--schema` - Migrate a specific schema or multiple schemas from the source database to the destination database.
165
+
-`--schema-mapping-file` - Migrate specific schemas specified in a file from the source database to the destination database.
166
+
-`--include-table` - Migrate specific tables or multiple tables from the source database to the destination database.
167
+
-`--include-table-file` - Migrate specific tables specified in a file from the source database to the destination database.
168
+
-`--global-metadata-only` - Migrate global objects from the source database to the destination database.
161
169
162
170
### Data Loading Modes
163
171
cbcopy supports two data loading modes.
@@ -169,20 +177,20 @@ cbcopy supports two data loading modes.
169
177
170
178
If the tables you are migrating depend on certain global objects (such as tablespaces), there are two ways to handle this:
171
179
172
-
1. Include the `--with-global-metadata` option (default: false) during migration, which will automatically create these global objects in the target database.
180
+
1. Include the `--with-global-metadata` option (default: false) during migration, which will automatically create these global objects in the destination database.
173
181
174
-
2. If you choose not to use `--with-global-metadata`, you must manually create these global objects in the target database before running the migration. For example:
182
+
2. If you choose not to use `--with-global-metadata`, you must manually create these global objects in the destination database before running the migration. For example:
175
183
```sql
176
184
-- If your tables use custom tablespaces, create them first:
If neither option is taken, the creation of dependent tables in the target database will fail with errors like "tablespace 'custom_tablespace' does not exist".
188
+
If neither option is taken, the creation of dependent tables in the destination database will fail with errors like "tablespace 'custom_tablespace' does not exist".
181
189
182
190
### Role
183
-
If you want to change the ownership of the tables during migration without creating identical roles in the target database (by disabling the `--with-global-metadata` option), you need to:
191
+
If you want to change the ownership of the tables during migration without creating identical roles in the destination database (by disabling the `--with-global-metadata` option), you need to:
184
192
185
-
1. First create the target roles in the target database
193
+
1. First create the target roles in the destination database
186
194
2. Use the `--owner-mapping-file` to specify the mapping between source and target roles
187
195
188
196
For example, if you have a mapping file with:
@@ -196,19 +204,19 @@ The migration process will execute statements like:
196
204
ALTERTABLE table_name OWNER TO target_role1;
197
205
```
198
206
199
-
If the target role doesn't exist in the target database, these ownership change statements will fail with an error like "role 'target_role1' does not exist".
207
+
If the target role doesn't exist in the destination database, these ownership change statements will fail with an error like "role 'target_role1' does not exist".
200
208
201
209
### Tablespace
202
210
cbcopy provides three ways to handle tablespace migration:
203
211
204
-
1.**Default Mode** - When no tablespace options are specified, objects will be created in the same tablespace names as they were in the source database. You have two options to ensure the tablespaces exist in the target database:
212
+
1.**Default Mode** - When no tablespace options are specified, objects will be created in the same tablespace names as they were in the source database. You have two options to ensure the tablespaces exist in the destination database:
205
213
- Use `--with-global-metadata` to automatically create matching tablespaces
206
-
- Manually create the tablespaces in the target database before migration:
214
+
- Manually create the tablespaces in the destination database before migration:
2. **Single Target Tablespace** (`--dest-tablespace`) - Migrate all source database objects into a single specified tablespace on the target database, regardless of their original tablespace locations. For example:
219
+
2. **Single destination Tablespace** (`--dest-tablespace`) - Migrate all source database objects into a single specified tablespace on the destination database, regardless of their original tablespace locations. For example:
212
220
```bash
213
221
cbcopy --dest-tablespace=new_space ...
214
222
```
@@ -220,7 +228,7 @@ cbcopy provides three ways to handle tablespace migration:
220
228
```
221
229
222
230
Note:
223
-
- For the default mode, either use `--with-global-metadata` or ensure all required tablespaces exist in the target database before migration
231
+
- For the default mode, either use `--with-global-metadata` or ensure all required tablespaces exist in the destination database before migration
224
232
- If you need to migrate objects from different schemas into different tablespaces, you can either:
225
233
1. Use `--tablespace-mapping-file` to specify all mappings at once
226
234
2. Migrate one schema at a time using `--dest-tablespace` with different target tablespaces
@@ -230,15 +238,15 @@ Note:
230
238
- `--copy-jobs` - The maximum number of tables that concurrently copies.
231
239
232
240
### Validate Migration
233
-
During migration, we will compare the number of rows returned by `COPY TO` from the source database (i.e., the number of records coming out of the source database) with the number of rows returned by `COPY FROM` in the target database (i.e., the number of records loaded in the target database). If the two counts do not match, the migration of that table will fail.
241
+
During migration, we will compare the number of rows returned by `COPY TO` from the source database (i.e., the number of records coming out of the source database) with the number of rows returned by `COPY FROM` in the destination database (i.e., the number of records loaded in the destination database). If the two counts do not match, the migration of that table will fail.
234
242
235
243
### Copy Strategies
236
244
237
245
cbcopy internally supports three copy strategies for tables.
238
246
239
-
- `Copy On Coordinator` - If the table's statistics `pg_class->reltuples` is less than `--on-segment-threshold`, cbcopy will enable the `Copy On Coordinator` strategy for this table, meaning that data migration between the source and target databases can only occur through the coordinator node.
240
-
- `Copy On Segment` - If the table's statistics `pg_class->reltuples` is greater than `--on-segment-threshold`, and both the source and target databases have the same version and the same number of nodes, cbcopy will enable the `Copy On Segment` strategy for this table. This means that data migration between the source and target databases will occur in parallel across all segment nodes without data redistribution.
241
-
- `Copy on External Table` - For tables that do not meet the conditions for the above two strategies, cbcopy will enable the `Copy On External Table` strategy. This means that data migration between the source and target databases will occur in parallel across all segment nodes with data redistribution.
247
+
- `Copy On Coordinator` - If the table's statistics `pg_class->reltuples` is less than `--on-segment-threshold`, cbcopy will enable the `Copy On Coordinator` strategy for this table, meaning that data migration between the source and destination databases can only occur through the coordinator node.
248
+
- `Copy On Segment` - If the table's statistics `pg_class->reltuples` is greater than `--on-segment-threshold`, and both the source and target databases have the same version and the same number of nodes, cbcopy will enable the `Copy On Segment` strategy for this table. This means that data migration between the source and destination databases will occur in parallel across all segment nodes without data redistribution.
249
+
- `Copy on External Table` - For tables that do not meet the conditions for the above two strategies, cbcopy will enable the `Copy On External Table` strategy. This means that data migration between the source and destination databases will occur in parallel across all segment nodes with data redistribution.
0 commit comments