feat: Re-work behavior of arrow_schema parameter on sink_parquet by nameexhaustion · Pull Request #26621 · pola-rs/polars

nameexhaustion · 2026-02-19T11:33:25Z

Pre-work for supporting arrow schemas generated by PyIceberg.

Existing behavior

If arrow_schema is provided, its dtypes must match with the dtypes generated by us according to CompatLevel::oldest(). We will then copy any additional metadata in the provided schema.

This made it possible to correctly write PARQUET:field_id that is needed by Iceberg. However, on top of being able to write custom metadata, Iceberg also required the ability to specify the exact arrow type to export to (e.g. Binary -> FixedLenBinary) - a requirement that wasn't anticipated during the initial design.

New behavior after this PR

If arrow_schema is provided, we will convert to the exact types specified in the arrow schema, raising an error if this isn't possible.

Essentially, this makes it so that the exported arrow type is defined and controllable by the arrow_schema parameter, rather than being defined by a hardcoded CompatLevel::oldest(). We will use this later when writing Iceberg to e.g. write a Binary column as FixedLenBinary (Iceberg fixed(n) type).

Example

pl.DataFrame(
    {
        "large_utf8": "A",
        "large_binary": [b"B"],
        "utf8view": "C",
        "binaryview": [b"D"],
    }
).write_parquet(
    ...,
    arrow_schema=pa.schema(
        [
            pa.field("large_utf8", pa.large_string()),
            pa.field("large_binary", pa.large_binary()),
            pa.field("utf8view", pa.string_view()),
            pa.field("binaryview", pa.binary_view()),
        ]
    )
)

Example - Existing behavior
Errors with SchemaError: to_arrow(): provided dtype (Utf8View) does not match output dtype (LargeUtf8)

Example - New behavior
Successfully write a parquet file with the following schema

large_utf8: large_string
large_binary: large_binary
utf8view: string_view
binaryview: binary_view

github-actions · 2026-02-19T12:49:50Z

The uncompressed lib size after this PR is 53.7169 MB.

github-actions · 2026-02-19T13:52:14Z

The uncompressed lib size after this PR is 53.7229 MB.

codecov · 2026-02-19T14:09:34Z

Codecov Report

❌ Patch coverage is 93.18885% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.19%. Comparing base (4929540) to head (c6a8754).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/polars-core/src/series/into.rs	94.50%	10 Missing ⚠️
crates/polars-arrow/src/datatypes/mapper.rs	75.00%	6 Missing ⚠️
...tes/polars-core/src/series/categorical_to_arrow.rs	91.89%	3 Missing ⚠️
...lars-core/src/chunked_array/logical/categorical.rs	0.00%	1 Missing ⚠️
...es/polars-core/src/chunked_array/object/builder.rs	90.00%	1 Missing ⚠️
crates/polars-plan/src/plans/schema.rs	94.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #26621      +/-   ##
==========================================
- Coverage   81.37%   81.19%   -0.19%     
==========================================
  Files        1794     1795       +1     
  Lines      244998   245086      +88     
  Branches     3079     3080       +1     
==========================================
- Hits       199379   198989     -390     
- Misses      44833    45311     +478     
  Partials      786      786

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-02-19T15:16:09Z

The uncompressed lib size after this PR is 53.7225 MB.

github-actions · 2026-02-19T18:38:59Z

The uncompressed lib size after this PR is 53.7228 MB.

github-actions · 2026-02-19T19:17:50Z

The uncompressed lib size after this PR is 53.7237 MB.

github-actions bot added A-io-parquet Area: reading/writing Parquet files enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Feb 19, 2026

nameexhaustion added 16 commits February 20, 2026 05:15

c

76b5854

c

af11a67

c

87472ed

c

1e66a26

c

7a4aea9

fix objects

615f4c2

move the fix

3f5b36b

c

433503b

polars dtype mutator

d0b50c6

c

4383361

c

3f30339

c

b331c96

c

fe6fc61

c

0cb06ba

c

bead894

c

bb6af05

nameexhaustion force-pushed the nxs/to-arrow-mode branch from 42e6ea9 to bb6af05 Compare February 19, 2026 18:15

nameexhaustion mentioned this pull request Feb 19, 2026

refactor(rust): Add dtype visitor #26628

Draft

nameexhaustion added 2 commits February 20, 2026 05:48

c

077d0a3

fix

b7640bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Re-work behavior of arrow_schema parameter on sink_parquet#26621

feat: Re-work behavior of arrow_schema parameter on sink_parquet#26621
nameexhaustion wants to merge 18 commits intomainfrom
nxs/to-arrow-mode

nameexhaustion commented Feb 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

codecov bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

nameexhaustion commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Existing behavior

New behavior after this PR

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

codecov bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

nameexhaustion commented Feb 19, 2026 •

edited

Loading

codecov bot commented Feb 19, 2026 •

edited

Loading