-
Notifications
You must be signed in to change notification settings - Fork 154
Description
We (ab)use DVT in a customer project to validate our migration of z/OS Db2 tables to Postgres. To access the z/OS Db2 tables we use a connection via Oracle Heterogeneous Services.
When evaluating DVT for our use-case we noticed two major problems:
- the
-validate -rowoption for tables is prone toconcatcollisions, i.e. the concatenation of "12" and "3" would give the same result and therefore the same hash as the concatenation of "1" and "23". - partitioning large tables depends on the sort order of key columns, which is different in our case due to Heterogeneous Services inheriting z/OS Db2 EBCDIC sort order.
To resolve these problems we imagined two approaches:
- generating/manipulating DVT YAML configuration files
- implementing suitable custom SQL queries
We decided to go for the second option due to the sparse documentation of the internal configuration and expecting the auditing of SQL statements to be easier for third parties.
To combat concat collisions in custom queries we format the columns to a fixed sized format. To work around the sort order issue we use hash partitioning, i.e. the MD5 hash of the key modulo the desired partition count.
We suspect this approach to somehow collide with implementation details of DVTs -hash option.
Is there anything we missed and/or do you see any room for improvement?