Skip to content

Add S3FS Cursor for lightweight CSV-based result reading#630

Merged
laughingman7743 merged 7 commits intomasterfrom
feature/s3fs-cursor
Jan 1, 2026
Merged

Add S3FS Cursor for lightweight CSV-based result reading#630
laughingman7743 merged 7 commits intomasterfrom
feature/s3fs-cursor

Conversation

@laughingman7743
Copy link
Member

Summary

Implements Issue #272: Add a new cursor type that reads CSV results from S3 using Python's standard csv module and PyAthena's S3FileSystem, without requiring pandas or pyarrow dependencies.

New Features

  • S3FSCursor: Synchronous cursor for reading CSV/TXT results from S3
  • AsyncS3FSCursor: Asynchronous cursor using concurrent.futures
  • AthenaS3FSResultSet: Streaming CSV reader with type conversion
  • DefaultS3FSTypeConverter: Type converter for CSV-based results
  • SQLAlchemy dialect: awsathena+s3fs:// connection URL support

Additional Changes

  • Added rowcount property to WithResultSet mixin for CTAS support, benefiting all cursor types (base, pandas, arrow, s3fs)
  • Added CTAS tests for base, pandas, arrow, and s3fs cursors

Usage Example

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

conn = connect(s3_staging_dir="s3://bucket/path")
cursor = conn.cursor(S3FSCursor)
cursor.execute("SELECT * FROM my_table")
rows = cursor.fetchall()

With SQLAlchemy:

from sqlalchemy import create_engine
engine = create_engine(
    "awsathena+s3fs://:@athena.us-east-1.amazonaws.com/database"
    "?s3_staging_dir=s3://bucket/path"
)

Closes #272

🤖 Generated with Claude Code

laughingman7743 and others added 6 commits January 1, 2026 14:28
Implements Issue #272: Add a new cursor type that reads CSV results from S3
using Python's standard csv module and PyAthena's S3FileSystem, without
requiring pandas or pyarrow dependencies.

New features:
- S3FSCursor: Synchronous cursor for reading CSV/TXT results from S3
- AsyncS3FSCursor: Asynchronous cursor using concurrent.futures
- AthenaS3FSResultSet: Streaming CSV reader with type conversion
- DefaultS3FSTypeConverter: Type converter for CSV-based results
- SQLAlchemy dialect: awsathena+s3fs:// connection URL support

Also adds rowcount property to WithResultSet mixin for CTAS support,
benefiting all cursor types (base, pandas, arrow, s3fs).

Closes #272

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Due to Python MRO, CursorIterator.rowcount was taking precedence over
WithResultSet.rowcount. The base Cursor class already has its own
rowcount property that delegates to result_set.rowcount. This commit
adds the same pattern to ArrowCursor, PandasCursor, and S3FSCursor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace list comprehension pattern with simpler random.choices(k=10).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test that the S3FS cursor correctly handles data containing tab and
newline characters, which are special characters in CSV/TSV parsing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add docs/s3fs.rst with comprehensive S3FSCursor and AsyncS3FSCursor documentation
- Add docs/api/s3fs.rst with API reference
- Update docs/index.rst to include s3fs in toctree
- Update docs/api.rst to include s3fs API reference

The documentation covers:
- Basic usage and connection examples
- Type conversion mappings
- Custom converter implementation
- Limitations compared to Arrow/Pandas cursors
- Use cases and recommendations
- AsyncS3FSCursor for asynchronous operations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add AthenaCSVReader (default): Custom parser that distinguishes NULL
  (unquoted empty) from empty string (quoted empty "")
- Add DefaultCSVReader: Python's standard csv module wrapper for
  backward compatibility (both NULL and empty string become empty string)
- Support multi-line quoted fields in AthenaCSVReader with optimized
  incremental quote state tracking (O(n) complexity)
- Add csv_reader parameter to S3FSCursor and AsyncS3FSCursor
- Refactor result_set.py to remove unnecessary instance variables
- Move header skipping to _init_csv_reader() for cleaner initialization
- Update documentation with CSV reader options and NULL handling details
- Add comprehensive unit tests for both CSV readers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@laughingman7743 laughingman7743 marked this pull request as ready for review January 1, 2026 10:07
@laughingman7743 laughingman7743 merged commit a64dfbd into master Jan 1, 2026
5 checks passed
@laughingman7743 laughingman7743 deleted the feature/s3fs-cursor branch January 1, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impl s3fs cursor

1 participant