Skip to content

Add NULL and empty string handling documentation#643

Merged
laughingman7743 merged 1 commit intomasterfrom
feature/null-handling-documentation
Jan 6, 2026
Merged

Add NULL and empty string handling documentation#643
laughingman7743 merged 1 commit intomasterfrom
feature/null-handling-documentation

Conversation

@laughingman7743
Copy link
Member

@laughingman7743 laughingman7743 commented Jan 6, 2026

Summary

  • Add comprehensive documentation (docs/null_handling.rst) explaining NULL and empty string behavior across different cursor types
  • Document the CSV format limitation where NULL (,,) and empty string (,"",) are represented differently
  • Provide workarounds: unload=True option, PolarsCursor, or S3FSCursor
  • Unify test method names to test_null_vs_empty_string across all cursor types

Cursor Behavior Summary

Cursor Data Source Empty String NULL Distinguishes?
Cursor Athena API '' None ✅ Yes
DictCursor Athena API '' None ✅ Yes
PandasCursor CSV NaN NaN ❌ No
PandasCursor + unload Parquet '' None ✅ Yes
ArrowCursor CSV '' '' ❌ No
ArrowCursor + unload Parquet '' null ✅ Yes
PolarsCursor CSV '' null ✅ Yes
S3FSCursor (AthenaCSVReader) CSV '' None ✅ Yes
S3FSCursor (DefaultCSVReader) CSV None None ❌ No

S3FSCursor CSV Readers

S3FSCursor supports two CSV readers:

  • AthenaCSVReader (default): Properly distinguishes NULL from empty strings
  • DefaultCSVReader: For backward compatibility; both become None

Test plan

  • Run make chk - all quality checks pass
  • Run cursor-specific tests to verify NULL/empty string handling
  • Build documentation with make docs to verify rst file renders correctly

Closes #118
Closes #148
Closes #168

🤖 Generated with Claude Code

Add comprehensive documentation explaining NULL and empty string behavior
across different cursor types:
- Default Cursor and DictCursor properly distinguish NULL from empty string
- PandasCursor with CSV treats both as NaN, but works correctly with unload
- ArrowCursor with CSV treats both as empty string, but works correctly with unload
- PolarsCursor properly distinguishes in both CSV and Parquet modes
- S3FSCursor properly distinguishes using custom AthenaCSVReader

This documentation helps users understand the CSV format limitation and
provides workarounds (unload option, PolarsCursor, S3FSCursor) for
applications that need to distinguish NULL from empty string values.

Unify test method names to `test_null_vs_empty_string` across all cursor types:
- TestCursor.test_null_vs_empty_string
- TestDictCursor.test_null_vs_empty_string
- TestPandasCursor.test_null_vs_empty_string (renamed from test_empty_and_null_string)
- TestArrowCursor.test_null_vs_empty_string
- TestPolarsCursor.test_null_vs_empty_string
- TestS3FSCursor.test_null_vs_empty_string (consolidated with parametrize)

Closes #118
Closes #148
Closes #168

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@laughingman7743 laughingman7743 force-pushed the feature/null-handling-documentation branch from 7c2c2ca to 7cc41cc Compare January 6, 2026 04:47
@laughingman7743 laughingman7743 marked this pull request as ready for review January 6, 2026 05:17
@laughingman7743 laughingman7743 merged commit e029eec into master Jan 6, 2026
5 checks passed
@laughingman7743 laughingman7743 deleted the feature/null-handling-documentation branch January 6, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant