Skip to content

Commit c022c26

Browse files
Update CLAUDE.md to reflect current codebase state
- Add missing cursor types: PolarsCursor and S3FSCursor - Reframe AsyncCursor as a variant pattern rather than a separate cursor type - Fix parameter formatting section: remove incorrect qmark style reference, add Presto/Hive escaping details and UNLOAD wrapping - Expand project structure tree with all modules (polars/, s3fs/, sqlalchemy compiler/types/preparer, top-level files) - Add all SQLAlchemy dialects (polars, s3fs) to the listing - Update release process to reflect hatch-vcs version management - Include polars and s3fs in the "consider impact" cursor list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent fdfdb24 commit c022c26

File tree

1 file changed

+79
-25
lines changed

1 file changed

+79
-25
lines changed

CLAUDE.md

Lines changed: 79 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,14 @@ PyAthena is a Python DB API 2.0 (PEP 249) compliant client library for Amazon At
1616

1717
### 2. Multiple Cursor Types
1818
The project supports different cursor implementations for various use cases:
19-
- **Standard Cursor** (`pyathena.cursor.Cursor`): Basic DB API cursor
20-
- **Async Cursor** (`pyathena.async_cursor.AsyncCursor`): For asynchronous operations
21-
- **Pandas Cursor** (`pyathena.pandas.cursor.PandasCursor`): Returns results as DataFrames
19+
- **Standard Cursor** (`pyathena.cursor.Cursor`): Basic DB API cursor returning tuples
20+
- **Pandas Cursor** (`pyathena.pandas.cursor.PandasCursor`): Returns results as pandas DataFrames
2221
- **Arrow Cursor** (`pyathena.arrow.cursor.ArrowCursor`): Returns results in Apache Arrow format
23-
- **Spark Cursor** (`pyathena.spark.cursor.SparkCursor`): For PySpark integration
22+
- **Polars Cursor** (`pyathena.polars.cursor.PolarsCursor`): Returns results as Polars DataFrames
23+
- **S3FS Cursor** (`pyathena.s3fs.cursor.S3FSCursor`): Lightweight CSV-based cursor using S3 filesystem (no pandas/arrow dependency)
24+
- **Spark Cursor** (`pyathena.spark.cursor.SparkCursor`): For PySpark integration with Athena Spark workgroups
25+
26+
Each cursor type (except Spark) has a corresponding async variant (e.g., `AsyncCursor`, `AsyncPandasCursor`, `AsyncArrowCursor`, `AsyncPolarsCursor`, `AsyncS3FSCursor`).
2427

2528
### 3. Type System and Conversion
2629
- Data type conversion is handled in `pyathena/converter.py`
@@ -193,7 +196,7 @@ def test_find_maxdepth(self, fs):
193196

194197
#### Adding a New Feature
195198
1. Check if it aligns with DB API 2.0 specifications
196-
2. Consider impact on all cursor types (standard, pandas, arrow, spark)
199+
2. Consider impact on all cursor types (standard, pandas, arrow, polars, s3fs, spark)
197200
3. Update type hints and ensure mypy passes
198201
4. Add comprehensive tests
199202
5. Update documentation if adding public APIs
@@ -214,30 +217,76 @@ def test_find_maxdepth(self, fs):
214217

215218
```
216219
pyathena/
217-
├── {cursor_type}/ # Cursor-specific implementations
218-
│ ├── __init__.py
219-
│ ├── cursor.py # Cursor implementation
220-
│ ├── converter.py # Type converters
221-
│ └── result_set.py # Result handling
220+
├── __init__.py # DB API 2.0 globals, connect() entry point
221+
├── connection.py # Connection class
222+
├── cursor.py # Standard Cursor
223+
├── async_cursor.py # Standard AsyncCursor
224+
├── common.py # Base cursor classes (BaseCursor, CursorIterator)
225+
├── converter.py # Type conversion utilities
226+
├── formatter.py # SQL parameter formatting, UNLOAD wrapping
227+
├── result_set.py # Base result set handling
228+
├── model.py # Data models and enums
229+
├── error.py # Exception hierarchy
230+
├── util.py # Utility functions
231+
232+
├── pandas/ # Pandas cursor implementation
233+
│ ├── cursor.py # PandasCursor
234+
│ ├── async_cursor.py # AsyncPandasCursor
235+
│ ├── converter.py # Pandas type converters
236+
│ └── result_set.py # Pandas result set handling
237+
238+
├── arrow/ # Arrow cursor implementation
239+
│ ├── cursor.py # ArrowCursor
240+
│ ├── async_cursor.py # AsyncArrowCursor
241+
│ ├── converter.py # Arrow type converters
242+
│ └── result_set.py # Arrow result set handling
243+
244+
├── polars/ # Polars cursor implementation
245+
│ ├── cursor.py # PolarsCursor
246+
│ ├── async_cursor.py # AsyncPolarsCursor
247+
│ ├── converter.py # Polars type converters
248+
│ └── result_set.py # Polars result set handling
249+
250+
├── s3fs/ # S3FS cursor implementation (lightweight CSV reader)
251+
│ ├── cursor.py # S3FSCursor
252+
│ ├── async_cursor.py # AsyncS3FSCursor
253+
│ ├── reader.py # CSV reader implementation
254+
│ ├── converter.py # S3FS type converters
255+
│ └── result_set.py # S3FS result set handling
256+
257+
├── spark/ # Spark cursor implementation
258+
│ ├── cursor.py # SparkCursor
259+
│ ├── async_cursor.py # AsyncSparkCursor
260+
│ └── common.py # Spark utilities
222261
223262
├── sqlalchemy/ # SQLAlchemy dialect implementations
224-
│ ├── base.py # Base dialect
225-
│ ├── {dialect}.py # Specific dialects (rest, pandas, arrow)
226-
│ └── requirements.py # SQLAlchemy requirements
263+
│ ├── base.py # Base AthenaDialect
264+
│ ├── rest.py # AthenaRestDialect (standard cursor)
265+
│ ├── pandas.py # AthenaPandasDialect
266+
│ ├── arrow.py # AthenaArrowDialect
267+
│ ├── polars.py # AthenaPolarsDialect
268+
│ ├── s3fs.py # AthenaS3FSDialect
269+
│ ├── compiler.py # SQL compiler for Athena
270+
│ ├── types.py # SQLAlchemy type mappings
271+
│ ├── preparer.py # SQL identifier preparer
272+
│ ├── constants.py # Dialect constants
273+
│ ├── util.py # Dialect utilities
274+
│ └── requirements.py # SQLAlchemy compatibility requirements
227275
228-
└── filesystem/ # S3 filesystem abstractions
229-
├── s3.py # S3FileSystem implementation (fsspec compatible)
230-
└── s3_object.py # S3 object representations
276+
└── filesystem/ # S3 filesystem abstractions
277+
├── s3.py # S3FileSystem implementation (fsspec compatible)
278+
└── s3_object.py # S3 object representations
231279
```
232280

233281
### Important Implementation Details
234282

235283
#### Parameter Formatting
236-
- Two parameter styles supported: `pyformat` (default) and `qmark`
237-
- Parameter formatting logic in `formatter.py`
238-
- PyFormat: `%(name)s` style
239-
- Qmark: `?` style
284+
- Parameter style: `pyformat` (`%(name)s` style) as declared in DB API 2.0 globals
285+
- Parameter formatting logic in `formatter.py` (`DefaultParameterFormatter`)
286+
- Uses Presto-style escaping (single quote doubling) for SELECT/WITH/INSERT/UPDATE/MERGE statements
287+
- Uses Hive-style escaping (backslash-based) for DDL statements (CREATE, DROP, etc.)
240288
- Always escape special characters in parameter values
289+
- `Formatter.wrap_unload()` wraps SELECT/WITH queries with UNLOAD for high-performance Parquet/ORC result retrieval
241290

242291
#### Result Set Handling
243292
- Results are typically staged in S3 (configured via `s3_staging_dir`)
@@ -289,11 +338,16 @@ When implementing filesystem methods:
289338
4. Don't forget to close cursors and connections to clean up resources
290339
5. Be aware of Athena service quotas and rate limits
291340

292-
### Release Process
293-
1. Update version in `pyathena/__init__.py`
294-
2. Ensure all tests pass
295-
3. Create a git tag for the release
296-
4. Build and publish to PyPI
341+
### Build System and Release Process
342+
343+
**Build System**: Hatchling with hatch-vcs for version control system integration.
344+
345+
**Version Management**: Versions are automatically derived from git tags via `hatch-vcs`. The generated version file is `pyathena/_version.py` (auto-generated, do not edit manually).
346+
347+
**Release Process**:
348+
1. Ensure all tests pass
349+
2. Create a git tag for the release (version is derived from the tag)
350+
3. Build and publish to PyPI
297351

298352
## Contact and Resources
299353
- **Repository**: https://github.com/laughingman7743/PyAthena

0 commit comments

Comments
 (0)