Skip to content

fix(python): Apply thousands_separator to count/null_count in describe() for non-numeric columns#26486

Open
veeceey wants to merge 2 commits intopola-rs:mainfrom
veeceey:fix/issue-25946-describe-thousands-separator
Open

fix(python): Apply thousands_separator to count/null_count in describe() for non-numeric columns#26486
veeceey wants to merge 2 commits intopola-rs:mainfrom
veeceey:fix/issue-25946-describe-thousands-separator

Conversation

@veeceey
Copy link

@veeceey veeceey commented Feb 8, 2026

Summary

  • Fixes describe doesn't abide thousands separator for string columns #25946
  • When using pl.Config(thousands_separator=...), the describe() method now correctly applies the thousands separator to count and null_count statistics for non-numeric columns (string, categorical, temporal, etc.)
  • Previously, these values were converted to plain strings via str(v) without formatting, while numeric columns received proper thousands formatting through their float representation

Changes

  • In LazyFrame.describe(), when casting non-numeric column statistics to strings, count and null_count values are now formatted with the configured thousands separator using f"{int(v):,}".replace(",", thousands_sep)
  • The separator is retrieved via a local import of get_thousands_separator() from polars._plr, avoiding any module-level import changes
  • When no thousands separator is configured (empty string), behavior is unchanged

Test plan

  • Added test_df_describe_thousands_separator_string_columns parametrized for both DataFrame and LazyFrame
  • All 13 existing describe() tests pass
  • Manually verified with the exact reproducer from the issue using both _ and ç separators

Before:

│ count      ┆ 2ç000.0  ┆ 2000 │

After:

│ count      ┆ 2ç000.0  ┆ 2ç000 │

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Feb 8, 2026
Condense the multi-line condition to a single line per ruff's formatting
preferences.
@veeceey
Copy link
Author

veeceey commented Feb 8, 2026

Manual Testing Results

Verified the fix for issue #25946 where describe() was not applying the configured thousands separator to count/null_count for non-numeric columns.

Test 1: Original Issue Scenario

import polars as pl

pl.Config.set_thousands_separator(",")
df = pl.DataFrame({"a": ["x"] * 2000, "b": [1.0, 2.0] * 1000})
print(df.describe())

Before fix:

  • String column 'a' count: 2000 (no separator)
  • Numeric column 'b' count: 2,000.0 (with separator)

After fix:

  • String column 'a' count: 2,000 (with separator)
  • Numeric column 'b' count: 2,000.0 (with separator)

Both columns now consistently use the configured separator.

Test 2: Different Separators

Tested with space (" "), underscore ("_"), and unicode ("ç") separators - all work correctly for both string and numeric columns.

Test 3: Null Count Formatting

pl.Config.set_thousands_separator(",")
df = pl.DataFrame({
    "with_nulls": [None] * 1500 + ["value"] * 500,
    "numeric": [1.0] * 1000 + [None] * 1000
})
print(df.describe())

Both count and null_count rows now correctly show separators for string columns (e.g., "1,500").

Test 4: LazyFrame

Verified that LazyFrame.describe() has the same behavior as DataFrame.describe() - thousands separators are applied consistently to string column statistics.

Test 5: Edge Cases

  • Empty separator (""): Works correctly, no formatting applied (plain integers)
  • Large numbers (>10,000): Separators applied at all appropriate positions

All tests pass. The fix successfully applies the configured thousands separator to count and null_count statistics for non-numeric columns, providing a consistent user experience across all column types.

@veeceey
Copy link
Author

veeceey commented Feb 19, 2026

Friendly ping - any chance someone could take a look at this when they get a chance? Happy to make any changes if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix python Related to Python Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

describe doesn't abide thousands separator for string columns

1 participant

Comments