Skip to content

import tool does not use real table size for sizing #1134

@Titancodeder

Description

@Titancodeder

Bug Report
The issue comes up when importing large volumes of compressed Parquet files. The on-disk size can be well below 500GB (around 90GB in our case), but the data expands past 500GB once decompressed. Because the split decision is based on the compressed file size rather than the actual uncompressed data, the import ends up with far fewer subtasks than it should, which hurts parallelism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions