Releases: ranaroussi/pystore
Releases · ranaroussi/pystore
1.0.1
What's Changed
- Update numpy dependency to support numpy 3.0 by @ranaroussi in #80
Full Changelog: 1.0.0...1.0.1
1.0.0
Change Log
1.0.0 (2025 Release)
Major Release with Performance Optimizations and Advanced Features
Modernization & Dependencies:
- Updated to Python 3.8+ only (dropped Python 2.7/3.5/3.6/3.7 support)
- Migrated from deprecated Fastparquet to PyArrow as the sole Parquet engine
- Updated all dependencies to modern versions (Pandas 2.0+, PyArrow 10.0+, Dask 2023.1+)
- Fixed all pandas deprecation warnings (replaced is_*_dtype with isinstance checks)
- Removed legacy code and deprecated features
New Features - Data Types & Storage:
- MultiIndex Support: Full support for storing and retrieving DataFrames with pandas MultiIndex
- Complex Data Types: Support for Timedelta, Period, Interval, and Categorical dtypes
- Nested Objects: Support for storing lists, dicts, and nested DataFrames as columns
- Timezone-Aware Operations: Proper handling of timezone data with UTC storage
- Schema Evolution: Flexible strategies for handling schema changes over time
- Data Validation Framework: Extensible validation rules with built-in validators
New Features - API & Operations:
- Async/Await Support: Non-blocking I/O operations via async_pystore
- Transaction Support: Atomic operations with rollback capabilities
- Context Managers: Transaction and batch operation context managers
- Validation Hooks: Set validators at collection level with custom rules
Performance Optimizations:
- Streaming Operations: Memory-efficient append for datasets larger than RAM (90% memory reduction)
- Batch Operations: 5-10x faster parallel read/write for multiple items
- Intelligent Partitioning: Automatic time-based and size-based partitioning
- Memory Management: 70% memory reduction with automatic DataFrame optimization
- Metadata Caching: 100x faster metadata access with TTL cache
- Query Optimization: Column selection and predicate pushdown at storage level
Bug Fixes & Improvements:
- Fixed append method to properly handle duplicates and schema evolution
- Fixed MultiIndex dtype preservation during storage operations
- Fixed timezone handling to ensure consistency across operations
- Fixed Period dtype frequency conversion issues (ME -> M)
- Fixed nested object serialization with proper null handling
- Improved error messages and validation throughout
- Added comprehensive test coverage for all new features
API Additions:
collection.append_stream()- Streaming append for large datasetscollection.write_batch()- Parallel write of multiple itemscollection.read_batch()- Efficient read of multiple itemscollection.set_validator()- Set data validation rulescollection.enable_schema_evolution()- Enable flexible schema handlingasync_pystore.store()- Async store context managertransaction()- Single transaction context managerbatch_transaction()- Batch operation context manager- Memory management utilities in
pystore.memorymodule - Partitioning utilities in
pystore.partitionmodule
Breaking Changes:
- Removed Python 2.7 and Python < 3.8 support
- Removed Fastparquet support (PyArrow only)
- Changed some internal APIs for better consistency
Full Changelog: 0.1.24...1.0.0