Skip to content

fix: Fix categorical value count sort panic regression#26412

Open
bayoumi17m wants to merge 4 commits intopola-rs:mainfrom
bayoumi17m:fix/category-valuecounts-sort-panic
Open

fix: Fix categorical value count sort panic regression#26412
bayoumi17m wants to merge 4 commits intopola-rs:mainfrom
bayoumi17m:fix/category-valuecounts-sort-panic

Conversation

@bayoumi17m
Copy link
Contributor

@bayoumi17m bayoumi17m commented Feb 4, 2026

Overview

Fix panic when calling list operations (e.g., list.sort()) on lists containing structs with Categorical or Enum fields. This is a regression introduced in #25661 when we modified the operations to use try_apply_amortized_same_type.

Root Cause / Proposed fix

Root Cause

  1. amortized_iter() converts structs to physical types, stripping _PL_CATEGORICAL2 metadata
  2. try_apply_amortized_same_type tries to collect with the original dtype (which has metadata)
  3. ListArray::new does strict dtype comparison including Arrow metadata → panic

Before #25661, try_apply_amortized inferred the dtype from collected values, and same_type() handled conversion afterward.

Proposed Fix

For list operations, check if inner type contains categoricals enums. If so, fall back to the more flexible try_apply_amortized + same_type approach that worked before #25661.

Originally, I did this in all list functions but extracted it to a helper that applies some closure F

Questions

  1. Based on the code block below, it seems to imply that there may be more interesting behaviors if we try out dates or other types. Should we test those out? Or is this not relevant?

let iter_dtype = match inner_dtype {
#[cfg(feature = "dtype-struct")]
DataType::Struct(_) => inner_dtype.to_physical(),
// TODO: figure out how to deal with physical/logical distinction
// physical primitives like time, date etc. work
// physical nested need more
_ => inner_dtype.clone(),
};

AI Usage

  1. I used AI over the past few weeks to understand the repository
  2. I used AI to help understand all the PRs from 1.35.1 to 1.36.x to understand what changes are likely related to this issue

Fixes #26383

@github-actions github-actions bot added A-dtype-categorical Area: categorical data type fix Bug fix python Related to Python Polars regression Issue introduced by a new release rust Related to Rust Polars labels Feb 4, 2026
@bayoumi17m bayoumi17m force-pushed the fix/category-valuecounts-sort-panic branch from dda5bc3 to b714be6 Compare February 4, 2026 03:17
@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 97.36842% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.00%. Comparing base (e55c3b0) to head (48a3cfe).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...tes/polars-core/src/chunked_array/list/iterator.rs 93.75% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #26412   +/-   ##
=======================================
  Coverage   80.99%   81.00%           
=======================================
  Files        1782     1782           
  Lines      243104   243138   +34     
  Branches     3078     3078           
=======================================
+ Hits       196901   196950   +49     
+ Misses      45400    45385   -15     
  Partials      803      803           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bayoumi17m bayoumi17m marked this pull request as ready for review February 4, 2026 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-dtype-categorical Area: categorical data type fix Bug fix python Related to Python Polars regression Issue introduced by a new release rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Categorical value_counts + sort PanicException in groupby context regression

1 participant

Comments