fix(python): Apply thousands_separator to count/null_count in describe() for non-numeric columns#26486
Conversation
…e() for non-numeric columns
Condense the multi-line condition to a single line per ruff's formatting preferences.
Manual Testing ResultsVerified the fix for issue #25946 where Test 1: Original Issue Scenarioimport polars as pl
pl.Config.set_thousands_separator(",")
df = pl.DataFrame({"a": ["x"] * 2000, "b": [1.0, 2.0] * 1000})
print(df.describe())Before fix:
After fix:
Both columns now consistently use the configured separator. Test 2: Different SeparatorsTested with space ( Test 3: Null Count Formattingpl.Config.set_thousands_separator(",")
df = pl.DataFrame({
"with_nulls": [None] * 1500 + ["value"] * 500,
"numeric": [1.0] * 1000 + [None] * 1000
})
print(df.describe())Both Test 4: LazyFrameVerified that LazyFrame.describe() has the same behavior as DataFrame.describe() - thousands separators are applied consistently to string column statistics. Test 5: Edge Cases
All tests pass. The fix successfully applies the configured thousands separator to count and null_count statistics for non-numeric columns, providing a consistent user experience across all column types. |
|
Friendly ping - any chance someone could take a look at this when they get a chance? Happy to make any changes if needed. |
Summary
describedoesn't abide thousands separator for string columns #25946pl.Config(thousands_separator=...), thedescribe()method now correctly applies the thousands separator tocountandnull_countstatistics for non-numeric columns (string, categorical, temporal, etc.)str(v)without formatting, while numeric columns received proper thousands formatting through their float representationChanges
LazyFrame.describe(), when casting non-numeric column statistics to strings,countandnull_countvalues are now formatted with the configured thousands separator usingf"{int(v):,}".replace(",", thousands_sep)get_thousands_separator()frompolars._plr, avoiding any module-level import changesTest plan
test_df_describe_thousands_separator_string_columnsparametrized for both DataFrame and LazyFramedescribe()tests pass_andçseparatorsBefore:
After: