Releases: man-group/ArcticDB
v6.2.0+man1
🚀 Features
# Stage now returns a StageResult -- a structure that can later be passed to finalize to specify explicilty which data to finalize
stage_result1 = lib.stage("sym", df1)
stage_result2 = lib.stage("sym", df2)
lib.finalize_staged_data("sym", stage_results=[stage_result2]) -- if stage_results is omitted, compacts everything
lib.read("sym").data -- returns df2. df1 is left in the staged index for later finalization
🐛 Fixes
- Make copy_index_key_recursively parallelizable (#2622)
- Add docstring and warning for disparity between
will_item_be_pickledandis_symbol_pickle(#2548) - Testing improvements
- Remove dependency on numpy.typing (#2626) (#2628)
The wheels are on PyPI. Below are for debugging:
v6.1.2+man0
🚀 Features
- Added new test utilities (#2593)
🐛 Fixes
- Remove signaled threads on TaskScheduler destruction on Windows with timeout (#2582)
- Fix linux debug symbols not showing (#2584)
- GCP installation tests fail on first versions when GCP was introduced (#2592)
- Resilient start of simulated storages (#2583)
- Ability to turn on/off storages during local testing (#2594)
- Added new test utilities (#2593)
The wheels are on PyPI. Below are for debugging:
v6.1.1
🐛 Fixes
- Mem leaks - increase mem limits so that we have time to create ASV tests (#2558)
- [Bugfix 9754509632] Fix use-after-stack-free (#2569)
- Add missing type hints to Library and fix append return type (#2574)
- Azure tests added (#2498)
- Disable operation on objects of mismatched types leading to corruption [Bugfix 9754433454] (#2572)
- Fix ASV test failures and Installation Test Failures on Master (#2578)
- Fix test_type_promotion_int64_and_float64_up_to_float64 (#2557)
- Fix workflow problems for Persistеnce test executions and ASV AWS S3 tests execution (#2585)
- Remove signaled threads on TaskScheduler destruction on Windows with … (#2588)
- Continue on error if macos wheel removal fails (#2599)
The wheels are on PyPI. Below are for debugging:
v6.1.0
This is the first externally published release from the major v6 releases. As such it includes some breaking changes to the type system
⚠️ Breaking Changes
- Groupby using dynamic schema now produces a stable dtype. Previously, the dtype depended on the segments being processed; now the dtype will always be the same dtype able to represent the column across all segments on disk.
- Return type for min/max aggregations in
groupbyfor libraries using dynamic schema now is the type able to represent the column across all segments, while in earlier versions it wasfloat64regardless of the type of the column across different segments. - Using
sumaggregation ingroupbyon aboolcolumn now returns the count of rows containingTrueas auint64. - Using
meanaggregation ingroupbyontimestampcolumns now returns atimestamp(instead offloat64). It is computed by taking the integer part of(1/n) * Σ ts[i], for i = 1 to n - The return type of a projection operation involving a floating point number will always be of type
float64regardless of the types involved in the computation. - Performing a
filterorprojectionon an empty DataFrame might throw an exception depending on the dtype of the columns, while older versions always returned an empty DataFrame. The dtype of the columns of an empty DataFrame depends on the version of Pandas that is used to write them. Some versions usefloat64by default; others useobject. Filtering likequery_builder = query_builder[query_builder["col"] < 5]will throw an exception if the DataFrame is empty and the type of an empty column isobject.
🚀 Features
- Resampling for libraries that use dynamic schema
- Add
batch_deleteAPIs and functionality (#2463) - Support open ended
row_rangeonQueryBuilderandreadmethods (#2550)
🐛 Fixes
- Fix handling of segments contained only of None values in sort merge (#2536)
- Fix OverflowError from to_json() call (#2556)
- Storage lock increase wait time and add artificial slow writes (#2497)
- Remove signaled threads on TaskScheduler destruction on Windows (#2544)
- Add utilities to let the storage failure simulator simulate high latency conditions (#2407)
- Extend testing for the index names returned by get_info (#2448)
- Fix flaky segfault in concat testing (#2458)
- conda-build: Use clang and clang++ 18 for osx (#2476)
- fix: Remove use after free (#2459)
- Fix segfault during encoding sparse data (#2475)
- maint: Replace Folly's ranges with the standard library's (#2479)
- conda-build: Use macos-14 (#2482)
- Bugfix 8083916814: Respect pickle_on_failure kwarg (#2474)
- Remove numpy pin (#2487)
- Update_batch additional tests (#2437)
- Make append_batch and update_batch noop with empty dataframes when there's an existing version (#2507)
- Release the GIL when logging from the Python API (#2486)
The wheels are on PyPI. Below are for debugging:
v6.1.1+man0
🐛 Fixes
- [Bugfix 9754509632] Fix use-after-stack-free (#2569)
- Add missing type hints to Library and fix append return type (#2574)
- Disable operation on objects of mismatched types leading to corruption [Bugfix 9754433454] (#2572)
- Remove signaled threads on TaskScheduler destruction on Windows with … (#2588)
The wheels are on PyPI. Below are for debugging:
v6.1.0+man1
🚀 Features
- Fix get_backing_store could check non-primary storage (#2534)
- Support open ended row_range on QueryBuilder and read methods (#2550)
🐛 Fixes
- Fix installation tests (#2549)
- Fix handling of segments contained only of None values in sort merge (#2536)
- Fix OverflowError from to_json() call (#2556)
- Storage lock increase wait time and add artificial slow writes (#2497)
- Add utilities to let the storage failure simulator simulate high latency conditions (#2407)
The wheels are on PyPI. Below are for debugging:
v6.0.0+man2
⚠️ Breaking Changes
Detailed description and code examples of the breaking changes can be found in (#2440)
Short summary of the breaking changes:
- Groupby using dynamic schema now produces a stable dtype. Previously, the dtype depended on the segments being processed; now the dtype will always be the same dtype able to represent the column across all segments on disk.
- Return type for min/max aggregations in
groupbyfor libraries using dynamic schema now is the type able to represent the column across all segments, while in earlier versions it wasfloat64regardless of the type of the column across different segments. - Using
sumaggregation ingroupbyon aboolcolumn now returns the count of rows containingTrueas auint64. - Using
meanaggregation ingroupbyontimestampcolumns now returns atimestamp(instead offloat64). It is computed by taking the integer part of$$\left( \frac{1}{n} \sum_{i=1}^{n} ts[i] \right)$$ - The return type of a projection operation involving a floating point number will always be of type
float64regardless of the types involved in the computation. - Performing a
filterorprojectionon an empty DataFrame might throw an exception depending on the dtype of the columns, while older versions always returned an empty DataFrame. The dtype of the columns of an empty DataFrame depends on the version of Pandas that is used to write them. Some versions usefloat64by default; others useobject. Filtering likequery_builder = query_builder[query_builder["col"] < 5]will throw an exception if the DataFrame is empty and the type of an empty column isobject.
🚀 Features
🐛 Fixes
- Extend testing for the index names returned by get_info (#2448)
- Fix flaky segfault in concat testing (#2458)
- conda-build: Use clang and clang++ 18 for osx (#2476)
- fix: Remove use after free (#2459)
- Fix segfault during encoding sparse data (#2475)
- maint: Replace Folly's ranges with the standard library's (#2479)
- Upgrade to sparrow==1.0.0 (#2484)
- conda-build: Use macos-14 (#2482)
- Bugfix 8083916814: Respect pickle_on_failure kwarg (#2474)
- Remove numpy pin (#2487)
- fix for MAC OS - tests should not halt anymore (#2506)
- Update_batch additional tests (#2437)
- Make append_batch and update_batch noop with empty dataframes when there's an existing version (#2507)
- Library tool/read segment to dataframe (#2477)
- Release the GIL when logging from the Python API (#2486)
- Do not crash when recursively normalizing dictionaries containing non-str keys (#2525)
- conda-build: Remove workarounds in specification (#2512)
- maint: Add support for libprotobuf 6 (#2455)
- Faster and error free ASV benchmarks (#2538)
- Fix installation tests (#2511)
- One line change to make getting library tool for Native Mongoose libraries easier (#2541)
The wheels are on PyPI. Below are for debugging:
v6.0.0+man1
⚠️ Breaking Changes
- V6.0.0 - implementation of resampling with dynamic schema and API breaking changes (#2440)
🚀 Features
🐛 Fixes
-
Extend testing for the index names returned by get_info (#2448)
-
Fix flaky segfault in concat testing (#2458)
-
conda-build: Use clang and clang++ 18 for osx (#2476)
-
fix: Remove use after free (#2459)
-
Fix segfault during encoding sparse data (#2475)
-
maint: Replace Folly's ranges with the standard library's (#2479)
-
Upgrade to sparrow==1.0.0 (#2484)
-
conda-build: Use macos-14 (#2482)
-
Bugfix 8083916814: Respect pickle_on_failure kwarg (#2474)
-
Remove numpy pin (#2487)
-
fix for MAC OS - tests should not halt anymore (#2506)
-
Update_batch additional tests (#2437)
-
Make append_batch and update_batch noop with empty dataframes when there's an existing version (#2507)
-
Library tool/read segment to dataframe (#2477)
-
Release the GIL when logging from the Python API (#2486)
-
Rebase v6.0.0 with latest master (#2532)
The wheels are on PyPI. Below are for debugging:
v5.10.0
Performance
- Reduce memory overhead when reading dataframes from ArcticDB by @alexowens90 in #2435
Fixes
- Fix recursive normalizers issue by @poodlewars in #2451
- Use a less strict pin for sparrow by @IvoDD in #2473
- Query Builder regex filter support and upgrade PCRE to PCRE2 V2 by @phoebusm in #2466
- Relax numpy pin by @poodlewars in #2509
The wheels are on PyPI. Below are for debugging:
Full Changelog: v5.9.3...v5.10.0