Skip to content

Conversation

@majiayu000
Copy link

Summary

  • Add SanitizeErrorDetail utility to sanitize strings for error messages
  • Update CSV parser to use sanitized error messages when displaying unparsable field values

Problem

When accidentally reading binary files (like parquet) with pl.read_csv(), the error messages could contain terminal control characters that interfere with terminal output (e.g., escape sequences like \033c that clear the screen or reset the terminal).

Solution

This PR adds a SanitizeErrorDetail utility that:

  • Replaces control characters (ASCII 0x00-0x1F and 0x7F) with the Unicode replacement character (U+FFFD)
  • Truncates long strings to 256 characters by default (configurable via POLARS_VERBOSE)

The CSV parser now uses this utility when displaying unparsable field values in error messages.

Test plan

  • Added unit tests for sanitize_for_error_display function
  • Added unit tests for SanitizeErrorDetail struct
  • All existing tests pass

Fixes #16106

…jection

When accidentally reading binary files (like parquet) with pl.read_csv(),
the error messages could contain terminal control characters that interfere
with terminal output (e.g., escape sequences that clear the screen).

This adds a SanitizeErrorDetail utility that:
- Replaces control characters (ASCII 0x00-0x1F and 0x7F) with the Unicode
  replacement character
- Truncates long strings to prevent flooding the terminal

The CSV parser now uses this utility when displaying unparsable field values
in error messages.

Fixes pola-rs#16106
@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.66%. Comparing base (37cda80) to head (2404c3e).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-utils/src/error.rs 92.30% 5 Missing ⚠️
crates/polars-io/src/csv/read/buffer.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #25781      +/-   ##
==========================================
- Coverage   80.57%   79.66%   -0.92%     
==========================================
  Files        1764     1764              
  Lines      242704   242775      +71     
  Branches     3042     3042              
==========================================
- Hits       195561   193396    -2165     
- Misses      46361    48597    +2236     
  Partials      782      782              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexander-beedie alexander-beedie changed the title fix: sanitize error messages to prevent terminal control character injection fix: Sanitize error messages to prevent terminal control character injection Dec 16, 2025
@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scuffed error message when importing parquet with pl.read_csv (by accident)

1 participant