Skip to content

Commit 0a86628

Browse files
Re-organized readme
1 parent 708df1a commit 0a86628

1 file changed

Lines changed: 60 additions & 70 deletions

File tree

readme.md

Lines changed: 60 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Percona docStreamer automates the complete, end-to-end migration from an Amazon
1414
* Full Sync: A parallelized, high-speed copy of all existing data from source collections.
1515
* Continuous Sync (CDC): Opens a change stream on the source DocumentDB to capture all inserts, updates, deletes and DDLs (with a few exceptions), applying them in batches to the target MongoDB for real-time synchronization.
1616

17-
## Prerequisites
17+
## 1. Prerequisites
1818

1919
### DocumentDB Pre-Setup
2020

@@ -61,69 +61,7 @@ db.adminCommand({
6161
});
6262
```
6363

64-
The command below would only enable them for the 3 collections shown:
65-
66-
```bash
67-
use admin;
68-
db.adminCommand({modifyChangeStreams: 1, database: "percona_db_1", collection: "test_1", enable: true});
69-
db.adminCommand({modifyChangeStreams: 1, database: "percona_db_1", collection: "test_2", enable: true});
70-
db.adminCommand({modifyChangeStreams: 1, database: "percona_db_1", collection: "test_3", enable: true});
71-
```
72-
73-
## Important Notes & Limitations
74-
75-
### Best Practices & Safety Precautions
76-
77-
To ensure data integrity and prevent accidental data loss during migration, we recommend following these guidelines before initiating a docStreamer process.
78-
79-
- Backup Your Destination ***(if the destination environment contains data)*** Because Percona docStreamer will overwrite documents with matching _ids, always create a backup of your destination database before running a Full Sync. This ensures you can roll back if valid data is accidentally overwritten.
80-
81-
- Audit Existing Collections: ***(if the destination environment contains data)*** Check your destination database to see if collections with the same names as your source already exist. If they do, verify if the data is intended to be merged. If not, consider renaming the source or destination collection.
82-
83-
- Verify Connection Strings: Double-check your SOURCE and DESTINATION URIs. A common mistake is pointing the destination to a production environment instead of a staging environment, which can lead to unintended data commingling.
84-
85-
- Test with a Staging Run: If possible, perform a dry run or a migration into a temporary empty cluster/database first. This allows you to verify that the data maps correctly and the volume is as expected without risking your main data store.
86-
87-
**CRITICAL WARNING** regarding Data Overwrites: Please be aware that if a document in your Source has the same _id as a document in your Destination, the Destination document will be overwritten immediately. This action is irreversible once the sync is performed.
88-
89-
### Sharding support not tested
90-
91-
Migration from DocumentDB sharded clusters has not been tested and therefore the behavior is unknown. Support for sharded DocumentDB clusters will be added in the future.
92-
93-
### DocumentDB Cursor Rate Limiting
94-
95-
AWS DocumentDB enforces service quotas, including limits on the number of cursors and the rate of getMore operations, which are fundamental to how change streams work.
96-
97-
Symptom: If the migration falls too far behind (e.g., after being stopped for a long time) or if there is a massive burst of write activity, docStreamer may hit these rate limits. This can cause the change stream to fail or be terminated by AWS.
98-
99-
Behavior: Percona docStreamer is designed to be resilient and will attempt to retry and resume the stream. However, persistent rate-limiting from the DocumentDB side may require intervention (e.g., scaling your DocumentDB instance or running the migration during off-peak hours).
100-
101-
### DDL Operation Support During CDC Stage
102-
103-
Percona docStreamer has support for replicating most DDL operations during the CDC stage (after the full sync has completed).
104-
105-
Supported: drop (collection), dropDatabase, rename (collection), create (collection).
106-
107-
**NOT Supported**: createIndexes, dropIndexes. These operations will be skipped. You must manually recreate or drop indexes on the target cluster.
108-
109-
### Supported Index Types
110-
111-
Percona docStreamer automatically handles the creation of indexes during the Full Sync stage to ensure your destination performance matches the source. However, there are specific limitations regarding index types.
112-
113-
Currently Supported:
114-
115-
Most standard MongoDB index types (e.g., Single Field, Compound, Multikey, Geospatial).
116-
117-
Not Currently Supported:
118-
119-
The following index types are not migrated during the Full Sync and must be created manually on the destination if required:
120-
121-
- Text Indexes
122-
- Partial Indexes
123-
124-
***Note:*** We recommend reviewing your source indexes prior to migration. If your application relies heavily on text search or partial indexing, plan to run a post-migration script to reconstruct these specific indexes on the destination cluster.
125-
126-
## Installing Percona docStreamer
64+
## 2. Installing Percona docStreamer
12765

12866
We recommend you have a dedicated server to run Percona docStreamer.
12967

@@ -196,7 +134,7 @@ go mod tidy
196134
make build-local
197135
```
198136

199-
## Configure Users
137+
## 3. Configure Users
200138

201139
### Migration Users
202140

@@ -227,9 +165,9 @@ db.getSiblingDB('admin').createUser({
227165
Because docStreamer does not migrate user accounts or roles, you must manually create any users and roles required by your application. [Follow the appropriate procedure](https://www.mongodb.com/docs/manual/tutorial/manage-users-and-roles/) based on whether you are migrating to a sharded cluster or a replica set. Failure to create these users and roles in the destination cluster will prevent your application from connecting after the cutover process.
228166

229167

230-
## Configuring Percona docStreamer
168+
## 4. Configuring Percona docStreamer
231169

232-
The application is configured via the [config.yaml](./config.yaml) file in the application's root directory. You will need to at the very least edit the source and destination parameters.
170+
The application is configured via the [config.yaml](./config.yaml) file in the application's root directory. Make sure you download the config file, as you will need to at the very least edit the source and destination parameters.
233171

234172
```yaml
235173
# Source DocumentDB
@@ -357,7 +295,7 @@ dry_run: False
357295

358296
You can modify any configuration through the [config.yaml](./config.yaml) file, including log locations and performance-related parameters. All options are clearly documented, and you are free to adjust them as needed.
359297

360-
## How to Use Percona docStreamer
298+
## 5. How to Use Percona docStreamer
361299

362300
Percona docStreamer runs as a background process that is controlled through a small set of simple commands, making its operation straightforward. After updating the configuration file to match your environment, you can execute the appropriate commands for each specific use case as shown below.
363301

@@ -820,7 +758,7 @@ _ __ /_ __ \ ___/____ \_ __/_ ___/ _ \ __ `/_ __ `__ \ _ \_ ___/
820758
```
821759
</details>
822760

823-
## Performance & Optimization
761+
## 6. Performance & Optimization
824762

825763
### Full Load Optimization
826764

@@ -891,10 +829,62 @@ The data validation engine is highly configurable to balance performance impact
891829
| retry_interval_ms | 500 | Hot Key Handling. If a record fails validation because it is actively being modified (detected via dirty tracking), the validator waits this long before re-checking it. |
892830
| max_retries | 3 | Persistence. How many times to retry a "Hot Key" before giving up. After this many attempts, the record is marked as a mismatch/skipped to move on. |
893831

894-
## Additional Documentation
832+
833+
## 7. Additional Documentation
895834

896835
We have created a page dedicated to a more in [depth explanation of how Percona docStreamer works](./details.md) as well as a [frequently asked questions](./faq.md) page.
897836

898837

838+
## 8. Important Notes & Limitations
839+
840+
### Best Practices & Safety Precautions
841+
842+
To ensure data integrity and prevent accidental data loss during migration, we recommend following these guidelines before initiating a docStreamer process.
843+
844+
- Backup Your Destination ***(if the destination environment contains data)*** Because Percona docStreamer will overwrite documents with matching _ids, always create a backup of your destination database before running a Full Sync. This ensures you can roll back if valid data is accidentally overwritten.
845+
846+
- Audit Existing Collections: ***(if the destination environment contains data)*** Check your destination database to see if collections with the same names as your source already exist. If they do, verify if the data is intended to be merged. If not, consider renaming the source or destination collection.
847+
848+
- Verify Connection Strings: Double-check your SOURCE and DESTINATION URIs. A common mistake is pointing the destination to a production environment instead of a staging environment, which can lead to unintended data commingling.
849+
850+
- Test with a Staging Run: If possible, perform a dry run or a migration into a temporary empty cluster/database first. This allows you to verify that the data maps correctly and the volume is as expected without risking your main data store.
851+
852+
**CRITICAL WARNING** regarding Data Overwrites: Please be aware that if a document in your Source has the same _id as a document in your Destination, the Destination document will be overwritten immediately. This action is irreversible once the sync is performed.
853+
854+
### Sharding support not tested
855+
856+
Migration from DocumentDB sharded clusters has not been tested and therefore the behavior is unknown. Support for sharded DocumentDB clusters will be added in the future.
857+
858+
### DocumentDB Cursor Rate Limiting
859+
860+
AWS DocumentDB enforces service quotas, including limits on the number of cursors and the rate of getMore operations, which are fundamental to how change streams work.
861+
862+
Symptom: If the migration falls too far behind (e.g., after being stopped for a long time) or if there is a massive burst of write activity, docStreamer may hit these rate limits. This can cause the change stream to fail or be terminated by AWS.
863+
864+
Behavior: Percona docStreamer is designed to be resilient and will attempt to retry and resume the stream. However, persistent rate-limiting from the DocumentDB side may require intervention (e.g., scaling your DocumentDB instance or running the migration during off-peak hours).
865+
866+
### DDL Operation Support During CDC Stage
867+
868+
Percona docStreamer has support for replicating most DDL operations during the CDC stage (after the full sync has completed).
869+
870+
Supported: drop (collection), dropDatabase, rename (collection), create (collection).
871+
872+
**NOT Supported**: createIndexes, dropIndexes. These operations will be skipped. You must manually recreate or drop indexes on the target cluster.
899873

874+
### Supported Index Types
875+
876+
Percona docStreamer automatically handles the creation of indexes during the Full Sync stage to ensure your destination performance matches the source. However, there are specific limitations regarding index types.
877+
878+
Currently Supported:
879+
880+
Most standard MongoDB index types (e.g., Single Field, Compound, Multikey, Geospatial).
881+
882+
Not Currently Supported:
883+
884+
The following index types are not migrated during the Full Sync and must be created manually on the destination if required:
885+
886+
- Text Indexes
887+
- Partial Indexes
888+
889+
***Note:*** We recommend reviewing your source indexes prior to migration. If your application relies heavily on text search or partial indexing, plan to run a post-migration script to reconstruct these specific indexes on the destination cluster.
900890

0 commit comments

Comments
 (0)