Add S3FS Cursor for lightweight CSV-based result reading#630
Merged
laughingman7743 merged 7 commits intomasterfrom Jan 1, 2026
Merged
Add S3FS Cursor for lightweight CSV-based result reading#630laughingman7743 merged 7 commits intomasterfrom
laughingman7743 merged 7 commits intomasterfrom
Conversation
Implements Issue #272: Add a new cursor type that reads CSV results from S3 using Python's standard csv module and PyAthena's S3FileSystem, without requiring pandas or pyarrow dependencies. New features: - S3FSCursor: Synchronous cursor for reading CSV/TXT results from S3 - AsyncS3FSCursor: Asynchronous cursor using concurrent.futures - AthenaS3FSResultSet: Streaming CSV reader with type conversion - DefaultS3FSTypeConverter: Type converter for CSV-based results - SQLAlchemy dialect: awsathena+s3fs:// connection URL support Also adds rowcount property to WithResultSet mixin for CTAS support, benefiting all cursor types (base, pandas, arrow, s3fs). Closes #272 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Due to Python MRO, CursorIterator.rowcount was taking precedence over WithResultSet.rowcount. The base Cursor class already has its own rowcount property that delegates to result_set.rowcount. This commit adds the same pattern to ArrowCursor, PandasCursor, and S3FSCursor. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace list comprehension pattern with simpler random.choices(k=10). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test that the S3FS cursor correctly handles data containing tab and newline characters, which are special characters in CSV/TSV parsing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add docs/s3fs.rst with comprehensive S3FSCursor and AsyncS3FSCursor documentation - Add docs/api/s3fs.rst with API reference - Update docs/index.rst to include s3fs in toctree - Update docs/api.rst to include s3fs API reference The documentation covers: - Basic usage and connection examples - Type conversion mappings - Custom converter implementation - Limitations compared to Arrow/Pandas cursors - Use cases and recommendations - AsyncS3FSCursor for asynchronous operations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add AthenaCSVReader (default): Custom parser that distinguishes NULL (unquoted empty) from empty string (quoted empty "") - Add DefaultCSVReader: Python's standard csv module wrapper for backward compatibility (both NULL and empty string become empty string) - Support multi-line quoted fields in AthenaCSVReader with optimized incremental quote state tracking (O(n) complexity) - Add csv_reader parameter to S3FSCursor and AsyncS3FSCursor - Refactor result_set.py to remove unnecessary instance variables - Move header skipping to _init_csv_reader() for cleaner initialization - Update documentation with CSV reader options and NULL handling details - Add comprehensive unit tests for both CSV readers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
0f47155 to
32f0c84
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Issue #272: Add a new cursor type that reads CSV results from S3 using Python's standard csv module and PyAthena's S3FileSystem, without requiring pandas or pyarrow dependencies.
New Features
awsathena+s3fs://connection URL supportAdditional Changes
rowcountproperty toWithResultSetmixin for CTAS support, benefiting all cursor types (base, pandas, arrow, s3fs)Usage Example
With SQLAlchemy:
Closes #272
🤖 Generated with Claude Code