Align PandasCursor with PolarsCursor and optimize DataFrame operations by laughingman7743 · Pull Request #639 · pyathena-dev/PyAthena

laughingman7743 · 2026-01-04T07:05:05Z

Summary

This PR aligns the Pandas and Polars cursor implementations for consistency and adds performance optimizations for DataFrame operations.

API Alignment

Add as_pandas() method to PandasDataFrameIterator for collecting all chunks into a single DataFrame (mirrors as_polars())
Add iter_chunks() method to AthenaPandasResultSet for explicit iterator access
Refactor PandasCursor.iter_chunks() to delegate to ResultSet while preserving gc.collect() optimization

Bug Fixes

Fix iterrows() to maintain continuous row indices across chunks (was resetting to 0 for each chunk)
Fix column names cache initialization order for unload queries (was using stale metadata)

Refactoring

Rename DataFrameIterator to PandasDataFrameIterator and PolarsDataFrameIterator for clarity
Remove unused _closed flag from PolarsDataFrameIterator
Update close() to properly close generator resources
Update documentation (docs/pandas.rst, docs/polars.rst, docs/api/pandas.rst, docs/api/polars.rst)

Performance Optimizations

Pandas iterrows(): Use itertuples() instead of to_dict("records") to avoid loading all rows into memory at once
Pandas _trunc_date(): Cache time column names in __init__ to avoid repeated list comprehension on each chunk
Polars iterrows(): Replace inline lambda x: x with module-level _identity() function to avoid creating new function objects in hot path
Polars fetchone(): Cache column names in __init__ to avoid repeated _get_column_names() calls

Closes #638

🤖 Generated with Claude Code

- Add as_pandas() method to DataFrameIterator for collecting all chunks into a single DataFrame (mirrors PolarsCursor's as_polars() method) - Add iter_chunks() method to AthenaPandasResultSet for explicit iterator access - Refactor PandasCursor.iter_chunks() to delegate to ResultSet while preserving gc.collect() optimization for memory management - Add comprehensive docstrings with Google-style documentation - Update docs/pandas.rst with DataFrameIterator.as_pandas() examples This aligns the Pandas and Polars cursor implementations for consistency, making it easier for users to switch between them. Closes #638 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously, row indices reset to 0 for each chunk when iterating. Now row indices are continuous across all chunks, consistent with PolarsCursor's DataFrameIterator behavior. This is the expected behavior since chunking is an optimization detail that should be transparent to the caller. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…meIterator Rename the DataFrameIterator classes to include their respective module prefix for clarity and to avoid confusion when importing from both modules. - pyathena.pandas.result_set.DataFrameIterator → PandasDataFrameIterator - pyathena.polars.result_set.DataFrameIterator → PolarsDataFrameIterator Also updates all documentation and test references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Pandas iterrows(): Use itertuples() instead of to_dict("records") to avoid loading all rows into memory at once - Pandas _trunc_date(): Cache time column names in __init__ to avoid repeated list comprehension on each DataFrame chunk - Polars iterrows(): Replace inline lambda with module-level _identity function to avoid creating new function objects in hot path - Polars fetchone(): Cache column names in __init__ to avoid repeated _get_column_names() calls 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The _column_names_cache was being set before _create_dataframe_iterator() was called, but for unload queries, _as_polars() updates _metadata with the Parquet schema. This caused fetchone() to use stale column names that didn't match the actual DataFrame columns. Fix by moving cache initialization after _create_dataframe_iterator(), and using _get_column_names() directly in methods called during init. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove _closed flag that was not providing any real functionality - Update close() to properly close generator if reader is a generator - Align with PandasDataFrameIterator which doesn't use _closed flag 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Document the PolarsDataFrameIterator class and its as_polars() method for consistency with pandas.rst documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Added missing Polars entry to the installation extra packages table, documenting the pip install command and version requirement (>=1.0.0). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

laughingman7743 and others added 4 commits January 4, 2026 16:00

laughingman7743 changed the title ~~Align PandasCursor chunk processing with PolarsCursor implementation~~ Align PandasCursor with PolarsCursor and optimize DataFrame operations Jan 4, 2026

laughingman7743 and others added 3 commits January 4, 2026 18:54

laughingman7743 marked this pull request as ready for review January 4, 2026 10:14

laughingman7743 merged commit a2d9c20 into master Jan 4, 2026
5 checks passed

laughingman7743 deleted the feature/pandas-cursor-alignment branch January 4, 2026 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align PandasCursor with PolarsCursor and optimize DataFrame operations#639

Align PandasCursor with PolarsCursor and optimize DataFrame operations#639
laughingman7743 merged 8 commits intomasterfrom
feature/pandas-cursor-alignment

laughingman7743 commented Jan 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laughingman7743 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API Alignment

Bug Fixes

Refactoring

Performance Optimizations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

laughingman7743 commented Jan 4, 2026 •

edited

Loading