| title | Data Replication Features |
|---|---|
| summary | Learn about the data replication features provided by the Data Migration tool. |
| category | reference |
This document describes the data replication features provided by the Data Migration tool and explains the configuration of corresponding parameters.
For different DM versions, pay attention to the different match rules of schema or table names in the table routing, black & white lists, and binlog event filter features:
- For DM v1.0.5 or later versions, all the above features support the wildcard match. For all versions of DM, note that there can be only one
*in the wildcard expression, and*must be placed at the end. - For DM versions earlier than v1.0.5, table routing and binlog event filter support the wildcard but do not support the
[...]and[!...]expressions. The black & white lists only supports the regular expression.
It is recommended that you use the wildcard for matching in simple scenarios.
The table routing feature enables DM to replicate a certain table of the upstream MySQL or MariaDB instance to the specified table in the downstream.
Note:
- Configuring multiple different routing rules for a single table is not supported.
- The match rule of schema needs to be configured separately, which is used to replicate
create/drop schema xx, as shown inrule-2of the parameter configuration.
routes:
rule-1:
schema-pattern: "test_*"
table-pattern: "t_*"
target-schema: "test"
target-table: "t"
rule-2:
schema-pattern: "test_*"
target-schema: "test"DM replicates the upstream MySQL or MariaDB instance table that matches the schema-pattern/table-pattern rule provided by Table selector to the downstream target-schema/target-table.
This sections shows the usage examples in different scenarios.
Assuming in the scenario of sharded schemas and tables, you want to replicate the test_{1,2,3...}.t_{1,2,3...} tables in two upstream MySQL instances to the test.t table in the downstream TiDB instance.
To replicate the upstream instances to the downstream test.t, you must create two routing rules:
rule-1is used to replicate DML or DDL statements of the table that matchesschema-pattern: "test_*"andtable-pattern: "t_*"to the downstreamtest.t.rule-2is used to replicate DDL statements of the schema that matchesschema-pattern: "test_*", such ascreate/drop schema xx.
Note:
- If the downstream
schema: testalready exists and will not be deleted, you can omitrule-2.- If the downstream
schema: testdoes not exist and onlyrule-1is configured, then it reports theschema test doesn't existerror during replication.
rule-1:
schema-pattern: "test_*"
table-pattern: "t_*"
target-schema: "test"
target-table: "t"
rule-2:
schema-pattern: "test_*"
target-schema: "test"Assuming in the scenario of sharded schemas, you want to replicate the test_{1,2,3...}.t_{1,2,3...} tables in the two upstream MySQL instances to the test.t_{1,2,3...} tables in the downstream TiDB instance.
To replicate the upstream schemas to the downstream test.t_[1,2,3], you only need to create one routing rule.
rule-1:
schema-pattern: "test_*"
target-schema: "test"Assuming that the following two routing rules are configured and test_1_bak.t_1_bak matches both rule-1 and rule-2, an error is reported because the table routing configuration violates the number limitation.
rule-1:
schema-pattern: "test_*"
table-pattern: "t_*"
target-schema: "test"
target-table: "t"
rule-2:
schema-pattern: "test_1_bak"
table-pattern: "t_1_bak"
target-schema: "test"
target-table: "t_bak"The black and white lists filtering rule of the upstream database instance tables is similar to MySQL replication-rules-db/tables, which can be used to filter or only replicate all operations of some databases or some tables.
black-white-list:
rule-1:
do-dbs: ["test*"] # Starting with characters other than "~" indicates that it is a wildcard;
# v1.0.5 or later versions support the regular expression rules.
do-tables:
- db-name: "test[123]" # Matches test1, test2, and test3.
tbl-name: "t[1-5]" # Matches t1, t2, t3, t4, and t5.
- db-name: "test"
tbl-name: "t"
rule-2:
do-dbs: ["~^test.*"] # Starting with "~" indicates that it is a regular expression.
ignore-dbs: ["mysql"]
do-tables:
- db-name: "~^test.*"
tbl-name: "~^t.*"
- db-name: "test"
tbl-name: "t"
ignore-tables:
- db-name: "test"
tbl-name: "log"do-dbs: white lists of the schemas to be replicated, similar toreplicate-do-dbin MySQLignore-dbs: black lists of the schemas to be replicated, similar toreplicate-ignore-dbin MySQLdo-tables: white lists of the tables to be replicated, similar toreplicate-do-tablein MySQLignore-tables: black lists of the tables to be replicated, similar toreplicate-ignore-tablein MySQl
If a value of the above parameters starts with the ~ character, the subsequent characters of this value are treated as a regular expression. You can use this parameter to match schema or table names.
The filtering rules corresponding to do-dbs and ignore-dbs are similar to the Evaluation of Database-Level Replication and Binary Logging Options in MySQL. The filtering rules corresponding to do-tables and ignore-tables are similar to the Evaluation of Table-Level Replication Options in MySQL.
Note:
In DM and in MySQL, the white and black lists filtering rules are different in the following ways:
- In MySQL,
replicate-wild-do-tableandreplicate-wild-ignore-tablesupport wildcard characters. In DM, some parameter values directly supports regular expressions that start with the~character.- DM currently only supports binlogs in the
ROWformat, and does not support those in theSTATEMENTorMIXEDformat. Therefore, the filtering rules in DM correspond to those in theROWformat in MySQL.- MySQL determines a DDL statement only by the database name explicitly specified in the
USEsection of the statement. DM determines a statement first based on the database name section in the DDL statement. If the DDL statement does not contain such section, DM determines the statement by theUSEsection. Suppose that the SQL statement to be determined isUSE test_db_2; CREATE TABLE test_db_1.test_table (c1 INT PRIMARY KEY); thatreplicate-do-db=test_db_1is configured in MySQL anddo-dbs: ["test_db_1"]is configured in DM. Then this rule only applies to DM and not to MySQL.
The filtering process is as follows:
-
Filter at the schema level:
-
If
do-dbsis not empty, judge whether a matched schema exists indo-dbs.- If yes, continue to filter at the table level.
- If not, filter
test.t.
-
If
do-dbsis empty andignore-dbsis not empty, judge whether a matched schema exits inignore-dbs.- If yes, filter
test.t. - If not, continue to filter at the table level.
- If yes, filter
-
If both
do-dbsandignore-dbsare empty, continue to filter at the table level.
-
-
Filter at the table level:
-
If
do-tablesis not empty, judge whether a matched table exists indo-tables.- If yes, replicate
test.t. - If not, filter
test.t.
- If yes, replicate
-
If
ignore-tablesis not empty, judge whether a matched table exists inignore-tables.- If yes, filter
test.t. - If not, replicate
test.t.
- If yes, filter
-
If both
do-tablesandignore-tablesare empty, replicatetest.t.
-
Note:
To judge whether the schema
testis filtered, you only need to filter at the schema level.
Assume that the upstream MySQL instances include the following tables:
`logs`.`messages_2016`
`logs`.`messages_2017`
`logs`.`messages_2018`
`forum`.`users`
`forum`.`messages`
`forum_backup_2016`.`messages`
`forum_backup_2017`.`messages`
`forum_backup_2018`.`messages`
The configuration is as follows:
black-white-list:
bw-rule:
do-dbs: ["forum_backup_2018", "forum"]
ignore-dbs: ["~^forum_backup_"]
do-tables:
- db-name: "logs"
tbl-name: "~_2018$"
- db-name: "~^forum.*"
tbl-name: "messages"
ignore-tables:
- db-name: "~.*"
tbl-name: "^messages.*"After using the bw-rule rule:
| Table | Whether to filter | Why filter |
|---|---|---|
logs.messages_2016 |
Yes | The schema logs fails to match any do-dbs. |
logs.messages_2017 |
Yes | The schema logs fails to match any do-dbs. |
logs.messages_2018 |
Yes | The schema logs fails to match any do-dbs. |
forum_backup_2016.messages |
Yes | The schema forum_backup_2016 fails to match any do-dbs. |
forum_backup_2017.messages |
Yes | The schema forum_backup_2017 fails to match any do-dbs. |
forum.users |
Yes | 1. The schema forum matches do-dbs and continues to filter at the table level.2. The schema and table fail to match any of do-tables and ignore-tables and do-tables is not empty. |
forum.messages |
No | 1. The schema forum matches do-dbs and continues to filter at the table level.2. The table messages is in the db-name: "~^forum.*",tbl-name: "messages" of do-tables. |
forum_backup_2018.messages |
No | 1. The schema forum_backup_2018 matches do-dbs and continues to filter at the table level.2. The schema and table match the db-name: "~^forum.*",tbl-name: "messages" of do-tables. |
Binlog event filter is a more fine-grained filtering rule than the black and white lists filtering rule. You can use statements like INSERT or TRUNCATE TABLE to specify the binlog events of schema/table that you need to replicate or filter out.
Note:
If a same table matches multiple rules, these rules are applied in order and the black list has priority over the white list. This means if both the
IgnoreandDorules are applied to a single table, theIgnorerule takes effect.
filters:
rule-1:
schema-pattern: "test_*"
table-pattern: "t_*"
events: ["truncate table", "drop table"]
sql-pattern: ["^DROP\\s+PROCEDURE", "^CREATE\\s+PROCEDURE"]
action: Ignore-
schema-pattern/table-pattern: the binlog events or DDL SQL statements of upstream MySQL or MariaDB instance tables that matchschema-pattern/table-patternare filtered by the rules below. -
events: the binlog event array.Events Type Description allIncludes all the events below all dmlIncludes all DML events below all ddlIncludes all DDL events below noneIncludes none of the events below none ddlIncludes none of the DDL events below none dmlIncludes none of the DML events below insertDML The INSERTDML eventupdateDML The UPDATEDML eventdeleteDML The DELETEDML eventcreate databaseDDL The CREATE DATABASEDDL eventdrop databaseDDL The DROP DATABASEDDL eventcreate tableDDL The CREATE TABLEDDL eventcreate indexDDL The CREATE INDEXDDL eventdrop tableDDL The DROP TABLEDDL eventtruncate tableDDL The TRUNCATE TABLEDDL eventrename tableDDL The RENAME TABLEDDL eventdrop indexDDL The DROP INDEXDDL eventalter tableDDL The ALTER TABLEDDL event -
sql-pattern: it is used to filter specified DDL SQL statements. The matching rule supports using a regular expression. For example,"^DROP\\s+PROCEDURE". -
action: the string (Do/Ignore). Based on the following rules, it judges whether to filter. If either of the two rules is satisfied, the binlog will be filtered; otherwise, the binlog will not be filtered.Do: the white list. The binlog will be filtered in either of the following two conditions:- The type of the event is not in the
eventlist of the rule. - The SQL statement of the event cannot be matched by
sql-patternof the rule.
- The type of the event is not in the
Ignore: the black list. The binlog will be filtered in either of the following two conditions:- The type of the event is in the
eventlist of the rule. - The SQL statement of the event can be matched by
sql-patternof the rule.
- The type of the event is in the
This sections shows the usage examples in the scenario of sharding (sharded schemas and tables).
To filter out all deletion operations, configure the following two filtering rules:
filter-table-rulefilters out thetruncate table,drop tableanddelete statementoperations of all tables that match thetest_*.t_*pattern.filter-schema-rulefilters out thedrop databaseoperation of all schemas that match thetest_*pattern.
filters:
filter-table-rule:
schema-pattern: "test_*"
table-pattern: "t_*"
events: ["truncate table", "drop table", "delete"]
action: Ignore
filter-schema-rule:
schema-pattern: "test_*"
events: ["drop database"]
action: IgnoreTo only replicate sharding DML statements, configure the following two filtering rules:
do-table-ruleonly replicates thecreate table,insert,updateanddeletestatements of all tables that match thetest_*.t_*pattern.do-schema-ruleonly replicates thecreate databasestatement of all schemas that match thetest_*pattern.
Note:
The reason why the
create database/tablestatement is replicated is that you can replicate DML statements only after the schema and table are created.
filters:
do-table-rule:
schema-pattern: "test_*"
table-pattern: "t_*"
events: ["create table", "all dml"]
action: Do
do-schema-rule:
schema-pattern: "test_*"
events: ["create database"]
action: DoTo filter out the PROCEDURE statements that TiDB does not support, configure the following filter-procedure-rule:
filters:
filter-procedure-rule:
schema-pattern: "test_*"
table-pattern: "t_*"
sql-pattern: ["^DROP\\s+PROCEDURE", "^CREATE\\s+PROCEDURE"]
action: Ignorefilter-procedure-rule filters out the ^CREATE\\s+PROCEDURE and ^DROP\\s+PROCEDURE statements of all tables that match the test_*.t_* pattern.
For the SQL statements that the TiDB parser does not support, DM cannot parse them and get the schema/table information. So you must use the global filtering rule: schema-pattern: "*".
Note:
To avoid unexpectedly filtering out data that need to be replicated, you must configure the global filtering rule as strictly as possible.
To filter out the PARTITION statements that the TiDB parser does not support, configure the following filtering rule:
filters:
filter-partition-rule:
schema-pattern: "*"
sql-pattern: ["ALTER\\s+TABLE[\\s\\S]*ADD\\s+PARTITION", "ALTER\\s+TABLE[\\s\\S]*DROP\\s+PARTITION"]
action: IgnoreNote:
The column mapping is not recommended as the primary solution due to its usage restrictions. The preferable solution is handling conflicts of auto-increment primary key.
The column mapping feature supports modifying the value of table columns. You can execute different modification operations on the specified column according to different expressions. Currently, only the built-in expressions provided by DM are supported.
Note:
- It does not support modifying the column type and the table schema.
- It does not support configuring multiple different column mapping rules for a same table.
column-mappings:
rule-1:
schema-pattern: "test_*"
table-pattern: "t_*"
expression: "partition id"
source-column: "id"
target-column: "id"
arguments: ["1", "test", "t", "_"]
rule-2:
schema-pattern: "test_*"
table-pattern: "t_*"
expression: "partition id"
source-column: "id"
target-column: "id"
arguments: ["2", "test", "t", "_"]schema-pattern/table-pattern: to execute column value modifying operations on the upstream MySQL or MariaDB instance tables that match theschema-pattern/table-patternfiltering rule.source-column,target-column: to modify the value of thesource-columncolumn according to specifiedexpressionand assign the new value totarget-column.expression: the expression used to modify data. Currently, only thepartition idbuilt-in expression is supported.
partition id is used to resolve the conflicts of auto-increment primary keys of sharded tables.
partition id restrictions
Note the following restrictions:
- The
partition idexpression only supports the bigint type of auto-increment primary key. - If the
schema prefixis not empty, the schema name format must beschema prefixorschema prefix + separator + number (the schema ID). For example, it supportssands_1, but does not supports_a. - If the
table prefixis not empty, the table name format must betable prefixortable prefix + separator + number (the table ID). - If the schema/table name does not contain the
… + separator + numberpart, the corresponding ID is considered as 0. - Restrictions on sharding size:
- It supports 16 MySQL or MariaDB instances at most (Requirement: 0 <= instance ID <= 15).
- Each instance supports 128 schemas at most (Requirement: 0 <= schema ID <= 127).
- Each schema of each instance supports 256 tables at most (Requirement: 0 <= table ID <= 255).
- The range of the mapped column should meet the requirement: 0 <= ID <= 17592186044415.
- The
{instance ID, schema ID, table ID}group must be unique.
- Currently, the
partition idexpression is a customized feature. If you want to modify this feature, contact the corresponding developers.
partition id arguments configuration
Configure the following three or four arguments in order:
instance_id: the ID of the upstream sharded MySQL or MariaDB instance (0 <= instance ID <= 15)schema prefix: used to parse the schema name and get theschema IDtable prefix: used to parse the table name and get thetable ID- The separator: used to separate between the prefix and the IDs, and can be omitted to use an empty string as separator
Any of instance_id, schema prefix and table prefix can be set to an empty string ("") to indicate that the corresponding parts will not be encoded into the partition ID.
partition id expression rules
partition id fills the beginning bit of the auto-increment primary key ID with the argument number, and computes an int64 (MySQL bigint) type of value. The specific rules are as follows:
| instance_id | schema prefix | table prefix | Encoding |
|---|---|---|---|
| ☑ defined | ☑ defined | ☑ defined | [S: 1 bit] [I: 4 bits] [D: 7 bits] [T: 8 bits] [P: 44 bits] |
| ☐ empty | ☑ defined | ☑ defined | [S: 1 bit] [D: 7 bits] [T: 8 bits] [P: 48 bits] |
| ☑ defined | ☐ empty | ☑ defined | [S: 1 bit] [I: 4 bits] [T: 8 bits] [P: 51 bits] |
| ☑ defined | ☑ defined | ☐ empty | [S: 1 bit] [I: 4 bits] [D: 7 bits] [P: 52 bits] |
| ☐ empty | ☐ empty | ☑ defined | [S: 1 bit] [T: 8 bits] [P: 55 bits] |
| ☐ empty | ☑ defined | ☐ empty | [S: 1 bit] [D: 7 bits] [P: 56 bits] |
| ☑ defined | ☐ empty | ☐ empty | [S: 1 bit] [I: 4 bits] [P: 59 bits] |
S: the sign bit, reservedI: the instance ID, 4 bits by default if setD: the schema ID, 7 bits by default if setT: the table ID, 8 bits by default if setP: the auto-increment primary key ID, occupying the rest of bits (≥44 bits)
Assuming in the sharding scenario where all tables have the auto-increment primary key, you want to replicate two upstream MySQL instances test_{1,2,3...}.t_{1,2,3...} to the downstream TiDB instances test.t.
Configure the following two rules:
column-mappings:
rule-1:
schema-pattern: "test_*"
table-pattern: "t_*"
expression: "partition id"
source-column: "id"
target-column: "id"
arguments: ["1", "test", "t", "_"]
rule-2:
schema-pattern: "test_*"
table-pattern: "t_*"
expression: "partition id"
source-column: "id"
target-column: "id"
arguments: ["2", "test", "t", "_"]- The column ID of the MySQL instance 1 table
test_1.t_1is converted from1to1 << (64-1-4) | 1 << (64-1-4 -7) | 1 << 44 | 1 = 580981944116838401. - The row ID of the MySQL instance 2 table
test_1.t_2is converted from2to2 << (64-1-4) | 1 << (64-1-4 -7) | 2 << 44 | 2 = 1157460288606306306.
The heartbeat feature supports calculating the real-time replication delay between each replication task and MySQL or MariaDB based on real replication data.
Note:
- The estimation accuracy of the replication delay is at the second level.
- The heartbeat related binlog will not be replicated into the downstream, which is discarded after calculating the replication delay.
If the heartbeat feature is enabled, the upstream MySQL or MariaDB instances must provide the following privileges:
- SELECT
- INSERT
- CREATE (databases, tables)
In the task configuration file, enable the heartbeat feature:
enable-heartbeat: true
- DM-worker creates the
dm_heartbeat(currently unconfigurable) schema in the corresponding upstream MySQL or MariaDB. - DM-worker creates the
heartbeat(currently unconfigurable) table in the corresponding upstream MySQL or MariaDB. - DM-worker uses
replace statementto update the currentTS_mastertimestamp every second (currently unconfigurable) in the corresponding upstream MySQL or MariaDBdm_heartbeat.heartbeattables. - DM-worker updates the
TS_slave_taskreplication time after each replication task obtains thedm_heartbeat.heartbeatbinlog. - DM-worker queries the current
TS_mastertimestamp in the corresponding upstream MySQL or MariaDBdm_heartbeat.heartbeattables every 10 seconds, and calculatestask_lag=TS_master-TS_slave_taskfor each task.
See the replicate lag in the binlog replication processing unit of DM monitoring metrics.