You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: readme.md
+60-70Lines changed: 60 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Percona docStreamer automates the complete, end-to-end migration from an Amazon
14
14
* Full Sync: A parallelized, high-speed copy of all existing data from source collections.
15
15
* Continuous Sync (CDC): Opens a change stream on the source DocumentDB to capture all inserts, updates, deletes and DDLs (with a few exceptions), applying them in batches to the target MongoDB for real-time synchronization.
16
16
17
-
## Prerequisites
17
+
## 1. Prerequisites
18
18
19
19
### DocumentDB Pre-Setup
20
20
@@ -61,69 +61,7 @@ db.adminCommand({
61
61
});
62
62
```
63
63
64
-
The command below would only enable them for the 3 collections shown:
To ensure data integrity and prevent accidental data loss during migration, we recommend following these guidelines before initiating a docStreamer process.
78
-
79
-
- Backup Your Destination ***(if the destination environment contains data)*** Because Percona docStreamer will overwrite documents with matching _ids, always create a backup of your destination database before running a Full Sync. This ensures you can roll back if valid data is accidentally overwritten.
80
-
81
-
- Audit Existing Collections: ***(if the destination environment contains data)*** Check your destination database to see if collections with the same names as your source already exist. If they do, verify if the data is intended to be merged. If not, consider renaming the source or destination collection.
82
-
83
-
- Verify Connection Strings: Double-check your SOURCE and DESTINATION URIs. A common mistake is pointing the destination to a production environment instead of a staging environment, which can lead to unintended data commingling.
84
-
85
-
- Test with a Staging Run: If possible, perform a dry run or a migration into a temporary empty cluster/database first. This allows you to verify that the data maps correctly and the volume is as expected without risking your main data store.
86
-
87
-
**CRITICAL WARNING** regarding Data Overwrites: Please be aware that if a document in your Source has the same _id as a document in your Destination, the Destination document will be overwritten immediately. This action is irreversible once the sync is performed.
88
-
89
-
### Sharding support not tested
90
-
91
-
Migration from DocumentDB sharded clusters has not been tested and therefore the behavior is unknown. Support for sharded DocumentDB clusters will be added in the future.
92
-
93
-
### DocumentDB Cursor Rate Limiting
94
-
95
-
AWS DocumentDB enforces service quotas, including limits on the number of cursors and the rate of getMore operations, which are fundamental to how change streams work.
96
-
97
-
Symptom: If the migration falls too far behind (e.g., after being stopped for a long time) or if there is a massive burst of write activity, docStreamer may hit these rate limits. This can cause the change stream to fail or be terminated by AWS.
98
-
99
-
Behavior: Percona docStreamer is designed to be resilient and will attempt to retry and resume the stream. However, persistent rate-limiting from the DocumentDB side may require intervention (e.g., scaling your DocumentDB instance or running the migration during off-peak hours).
100
-
101
-
### DDL Operation Support During CDC Stage
102
-
103
-
Percona docStreamer has support for replicating most DDL operations during the CDC stage (after the full sync has completed).
104
-
105
-
Supported: drop (collection), dropDatabase, rename (collection), create (collection).
106
-
107
-
**NOT Supported**: createIndexes, dropIndexes. These operations will be skipped. You must manually recreate or drop indexes on the target cluster.
108
-
109
-
### Supported Index Types
110
-
111
-
Percona docStreamer automatically handles the creation of indexes during the Full Sync stage to ensure your destination performance matches the source. However, there are specific limitations regarding index types.
112
-
113
-
Currently Supported:
114
-
115
-
Most standard MongoDB index types (e.g., Single Field, Compound, Multikey, Geospatial).
116
-
117
-
Not Currently Supported:
118
-
119
-
The following index types are not migrated during the Full Sync and must be created manually on the destination if required:
120
-
121
-
- Text Indexes
122
-
- Partial Indexes
123
-
124
-
***Note:*** We recommend reviewing your source indexes prior to migration. If your application relies heavily on text search or partial indexing, plan to run a post-migration script to reconstruct these specific indexes on the destination cluster.
125
-
126
-
## Installing Percona docStreamer
64
+
## 2. Installing Percona docStreamer
127
65
128
66
We recommend you have a dedicated server to run Percona docStreamer.
Because docStreamer does not migrate user accounts or roles, you must manually create any users and roles required by your application. [Follow the appropriate procedure](https://www.mongodb.com/docs/manual/tutorial/manage-users-and-roles/) based on whether you are migrating to a sharded cluster or a replica set. Failure to create these users and roles in the destination cluster will prevent your application from connecting after the cutover process.
228
166
229
167
230
-
## Configuring Percona docStreamer
168
+
## 4. Configuring Percona docStreamer
231
169
232
-
The application is configured via the [config.yaml](./config.yaml) file in the application's root directory. You will need to at the very least edit the source and destination parameters.
170
+
The application is configured via the [config.yaml](./config.yaml) file in the application's root directory. Make sure you download the config file, as you will need to at the very least edit the source and destination parameters.
233
171
234
172
```yaml
235
173
# Source DocumentDB
@@ -357,7 +295,7 @@ dry_run: False
357
295
358
296
You can modify any configuration through the [config.yaml](./config.yaml) file, including log locations and performance-related parameters. All options are clearly documented, and you are free to adjust them as needed.
359
297
360
-
## How to Use Percona docStreamer
298
+
## 5. How to Use Percona docStreamer
361
299
362
300
Percona docStreamer runs as a background process that is controlled through a small set of simple commands, making its operation straightforward. After updating the configuration file to match your environment, you can execute the appropriate commands for each specific use case as shown below.
@@ -891,10 +829,62 @@ The data validation engine is highly configurable to balance performance impact
891
829
| retry_interval_ms | 500 | Hot Key Handling. If a record fails validation because it is actively being modified (detected via dirty tracking), the validator waits this long before re-checking it. |
892
830
| max_retries | 3 | Persistence. How many times to retry a "Hot Key" before giving up. After this many attempts, the record is marked as a mismatch/skipped to move on. |
893
831
894
-
## Additional Documentation
832
+
833
+
## 7. Additional Documentation
895
834
896
835
We have created a page dedicated to a more in [depth explanation of how Percona docStreamer works](./details.md) as well as a [frequently asked questions](./faq.md) page.
897
836
898
837
838
+
## 8. Important Notes & Limitations
839
+
840
+
### Best Practices & Safety Precautions
841
+
842
+
To ensure data integrity and prevent accidental data loss during migration, we recommend following these guidelines before initiating a docStreamer process.
843
+
844
+
- Backup Your Destination ***(if the destination environment contains data)*** Because Percona docStreamer will overwrite documents with matching _ids, always create a backup of your destination database before running a Full Sync. This ensures you can roll back if valid data is accidentally overwritten.
845
+
846
+
- Audit Existing Collections: ***(if the destination environment contains data)*** Check your destination database to see if collections with the same names as your source already exist. If they do, verify if the data is intended to be merged. If not, consider renaming the source or destination collection.
847
+
848
+
- Verify Connection Strings: Double-check your SOURCE and DESTINATION URIs. A common mistake is pointing the destination to a production environment instead of a staging environment, which can lead to unintended data commingling.
849
+
850
+
- Test with a Staging Run: If possible, perform a dry run or a migration into a temporary empty cluster/database first. This allows you to verify that the data maps correctly and the volume is as expected without risking your main data store.
851
+
852
+
**CRITICAL WARNING** regarding Data Overwrites: Please be aware that if a document in your Source has the same _id as a document in your Destination, the Destination document will be overwritten immediately. This action is irreversible once the sync is performed.
853
+
854
+
### Sharding support not tested
855
+
856
+
Migration from DocumentDB sharded clusters has not been tested and therefore the behavior is unknown. Support for sharded DocumentDB clusters will be added in the future.
857
+
858
+
### DocumentDB Cursor Rate Limiting
859
+
860
+
AWS DocumentDB enforces service quotas, including limits on the number of cursors and the rate of getMore operations, which are fundamental to how change streams work.
861
+
862
+
Symptom: If the migration falls too far behind (e.g., after being stopped for a long time) or if there is a massive burst of write activity, docStreamer may hit these rate limits. This can cause the change stream to fail or be terminated by AWS.
863
+
864
+
Behavior: Percona docStreamer is designed to be resilient and will attempt to retry and resume the stream. However, persistent rate-limiting from the DocumentDB side may require intervention (e.g., scaling your DocumentDB instance or running the migration during off-peak hours).
865
+
866
+
### DDL Operation Support During CDC Stage
867
+
868
+
Percona docStreamer has support for replicating most DDL operations during the CDC stage (after the full sync has completed).
869
+
870
+
Supported: drop (collection), dropDatabase, rename (collection), create (collection).
871
+
872
+
**NOT Supported**: createIndexes, dropIndexes. These operations will be skipped. You must manually recreate or drop indexes on the target cluster.
899
873
874
+
### Supported Index Types
875
+
876
+
Percona docStreamer automatically handles the creation of indexes during the Full Sync stage to ensure your destination performance matches the source. However, there are specific limitations regarding index types.
877
+
878
+
Currently Supported:
879
+
880
+
Most standard MongoDB index types (e.g., Single Field, Compound, Multikey, Geospatial).
881
+
882
+
Not Currently Supported:
883
+
884
+
The following index types are not migrated during the Full Sync and must be created manually on the destination if required:
885
+
886
+
- Text Indexes
887
+
- Partial Indexes
888
+
889
+
***Note:*** We recommend reviewing your source indexes prior to migration. If your application relies heavily on text search or partial indexing, plan to run a post-migration script to reconstruct these specific indexes on the destination cluster.
0 commit comments