Skip to content

fix(sort): skip excluded columns when sorting DataFrame#253

Open
AuburyEssentian wants to merge 1 commit intoparadigmxyz:mainfrom
AuburyEssentian:fix/sort-excluded-columns
Open

fix(sort): skip excluded columns when sorting DataFrame#253
AuburyEssentian wants to merge 1 commit intoparadigmxyz:mainfrom
AuburyEssentian:fix/sort-excluded-columns

Conversation

@AuburyEssentian
Copy link
Copy Markdown

Problem

Fixes #221.

When a column listed in a dataset's default_sort() is excluded via exclude_columns, it is absent from the output DataFrame. Polars then returns an error (not found: <col>) which propagates as a chunk failure and produces zero rows in the output — even though the data was collected correctly.

The Contracts dataset is the canonical reproducer: default_sort() returns ["block_number", "create_index"], so excluding create_index silently drops all results.

Fix

In sort_by_schema, filter sort_columns down to only those columns that exist in the DataFrame schema before calling df.sort(). If none remain, return the DataFrame unsorted rather than erroring.

This is a one-line conceptual fix confined to crates/freeze/src/types/dataframes/sort.rs and applies to any dataset that lists a column in default_sort() that can be excluded.

Testing

  • cargo build -p cryo_freeze passes cleanly.
  • Manually verified: before this patch, calling cryo contracts --exclude create_index would return 0 rows with chunk errors; after, it returns results sorted by block_number only.

When a column listed in default_sort() has been excluded via
exclude_columns, it is absent from the output DataFrame. Polars
then returns an error ('not found: <col>') which propagates as a
chunk failure and produces zero rows in the output.

Fix: filter sort_columns to only those present in the DataFrame
schema before calling df.sort(). If no sort columns remain after
filtering, return the DataFrame unsorted rather than erroring.

Fixes paradigmxyz#221
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Excluding 'create_index' column breaks Contracts dataset (python)

1 participant