Skip to content

Commit 06aeb92

Browse files
feat: mysql chunking optimisation (#797)
1 parent 88285d0 commit 06aeb92

6 files changed

Lines changed: 494 additions & 26 deletions

File tree

constants/constants.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,13 @@ const (
3333
EffectiveParquetSize = int64(256) * 1024 * 1024 * int64(8)
3434
DB2StateTimestampFormat = "2006-01-02 15:04:05.000000"
3535
DefaultStateTimestampFormat = "2006-01-02T15:04:05.000000000Z"
36+
// DistributionLower and DistributionUpper define the acceptable range
37+
// of the distribution factor for validating evenly distributed numeric PKs.
38+
DistributionLower = 0.05
39+
DistributionUpper = 1000.0
40+
// MysqlChunkAcceptanceRatio defines the minimum ratio of expected chunks that must be generated
41+
// for the split to be considered valid.
42+
MysqlChunkAcceptanceRatio = float64(0.8)
3643
)
3744

3845
type DriverType string

constants/state_version.go

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,16 @@ package constants
2929
// - Version 4: Unsigned int/integer/bigint map to Int64.
3030
// * Earlier unsigned int/integer/bigint were mapped to Int32 which caused integer overflows.
3131
//
32-
// - Version 5: (Current Version) MongoDB nested DateTime values decoded as UTC time.Time.
32+
// - Version 5: MongoDB nested DateTime values decoded as UTC time.Time.
3333
// * BSON DateTime at any depth is now decoded directly to time.Time (UTC) via a custom client registry, preventing json.Marshal crashes for out-of-range years ([0,9999]).
3434
// * Top-level DateTime fields that previously formatted with the local machine timezone (e.g. "+05:30") now always output UTC ("Z").
35+
//
36+
// - Version 6: (Current Version) Added []uint8 (byte slice) support in ReformatInt64
37+
// * Previously, numeric values returned as byte slices (common in some SQL drivers) caused errors
38+
// * Now these byte slices are parsed and converted into int64
3539

3640
const (
37-
LatestStateVersion = 5
41+
LatestStateVersion = 6
3842
)
3943

4044
// Used as the current version of the state when the program is running

0 commit comments

Comments
 (0)