Skip to content

Block number padding overflows at 9 digits (BSC already affected, Polygon next) #260

@tholcman

Description

@tholcman

Version
Current main (0.3.x as of 2026-05-28)

Platform
Any — reproducible on all platforms

Description

NumberChunk::format_item hardcodes {:0>8} for block numbers in output filenames. Rust's string formatter does not truncate — at block 100,000,000 the padding silently overflows to 9 digits:

bsc_mainnet__transactions__99995000_to_99999999.parquet ← 8 digits ✓
bsc_mainnet__transactions__100000000_to_100004999.parquet ← 9 digits ✗

This breaks lexicographic sort. Any pipeline that sorts output files by name (S3 prefix listings, ls, glob) will process block 100M files before block 10M files.

BSC crossed block 100,000,000 in May 2026 and is currently affected. Polygon is approaching ~70M blocks and will be next.

Why changing the constant is wrong

Replacing {:0>8} with {:0>9} or {:0>16} is a different hardcoded limit — it breaks all existing users' filenames on upgrade and requires another migration when the next chain overflows.

Proposed fix

Add --block-number-pad-width <N> CLI argument. Default 8 preserves current behaviour exactly (zero breaking change for existing users). Operators on high-block-number chains opt in by passing --block-number-pad-width 9.

I have an implementation ready if the approach sounds good.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions