Skip to content

Releases: ranaroussi/pystore

1.0.1

22 Jul 10:25
80af5fe

Choose a tag to compare

What's Changed

Full Changelog: 1.0.0...1.0.1

1.0.0

20 Jul 23:34
b52428e

Choose a tag to compare

Change Log

1.0.0 (2025 Release)

Major Release with Performance Optimizations and Advanced Features

Modernization & Dependencies:

  • Updated to Python 3.8+ only (dropped Python 2.7/3.5/3.6/3.7 support)
  • Migrated from deprecated Fastparquet to PyArrow as the sole Parquet engine
  • Updated all dependencies to modern versions (Pandas 2.0+, PyArrow 10.0+, Dask 2023.1+)
  • Fixed all pandas deprecation warnings (replaced is_*_dtype with isinstance checks)
  • Removed legacy code and deprecated features

New Features - Data Types & Storage:

  • MultiIndex Support: Full support for storing and retrieving DataFrames with pandas MultiIndex
  • Complex Data Types: Support for Timedelta, Period, Interval, and Categorical dtypes
  • Nested Objects: Support for storing lists, dicts, and nested DataFrames as columns
  • Timezone-Aware Operations: Proper handling of timezone data with UTC storage
  • Schema Evolution: Flexible strategies for handling schema changes over time
  • Data Validation Framework: Extensible validation rules with built-in validators

New Features - API & Operations:

  • Async/Await Support: Non-blocking I/O operations via async_pystore
  • Transaction Support: Atomic operations with rollback capabilities
  • Context Managers: Transaction and batch operation context managers
  • Validation Hooks: Set validators at collection level with custom rules

Performance Optimizations:

  • Streaming Operations: Memory-efficient append for datasets larger than RAM (90% memory reduction)
  • Batch Operations: 5-10x faster parallel read/write for multiple items
  • Intelligent Partitioning: Automatic time-based and size-based partitioning
  • Memory Management: 70% memory reduction with automatic DataFrame optimization
  • Metadata Caching: 100x faster metadata access with TTL cache
  • Query Optimization: Column selection and predicate pushdown at storage level

Bug Fixes & Improvements:

  • Fixed append method to properly handle duplicates and schema evolution
  • Fixed MultiIndex dtype preservation during storage operations
  • Fixed timezone handling to ensure consistency across operations
  • Fixed Period dtype frequency conversion issues (ME -> M)
  • Fixed nested object serialization with proper null handling
  • Improved error messages and validation throughout
  • Added comprehensive test coverage for all new features

API Additions:

  • collection.append_stream() - Streaming append for large datasets
  • collection.write_batch() - Parallel write of multiple items
  • collection.read_batch() - Efficient read of multiple items
  • collection.set_validator() - Set data validation rules
  • collection.enable_schema_evolution() - Enable flexible schema handling
  • async_pystore.store() - Async store context manager
  • transaction() - Single transaction context manager
  • batch_transaction() - Batch operation context manager
  • Memory management utilities in pystore.memory module
  • Partitioning utilities in pystore.partition module

Breaking Changes:

  • Removed Python 2.7 and Python < 3.8 support
  • Removed Fastparquet support (PyArrow only)
  • Changed some internal APIs for better consistency

Full Changelog: 0.1.24...1.0.0

0.1.24

10 Jul 10:12
db73c64

Choose a tag to compare

Thanks to @gnzsnz !

  • deprecated Fastparquet
  • enable Pyarrow as default engine

NOTE: Existing stores using fastparquet will require a manual update on metadata.json.

0.1.23

11 Feb 11:36
f3e94d4

Choose a tag to compare

  • Fixed deprecate 'in' operator to be compatible with pandas 1.2.0 onwards (PR #58)
  • Add argument to append() to control duplicates (PR #57)