Skip to content

Releases: man-group/ArcticDB

v4.2.0rc0

13 Dec 18:01

Choose a tag to compare

v4.2.0rc0 Pre-release
Pre-release

🚀 Features

  • Remove useless python deps (#1005)
  • feat: Allow row_range to be treated as a clause (#864)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> df = pd.DataFrame({"col1": np.arange(10), "col2": np.arange(100, 110)}, index=np.arange(10))
>>> ac = Arctic("lmdb://test")
>>> lib = ac.get_library("test_lib", create_if_missing=True)
>>> lib.write("test_symbol", df)
>>> lib.read("test_symbol", row_range=(3,7)).data
   col1  col2
3     3   103
4     4   104
5     5   105
6     6   106

🐛 Fixes

  • Symbol list refactor (#796)
  • Fixed aggregation on sparse grouping columns (#1068).
    Depending on timestamps being accurate in the symbol list has proved to be troublesome. Instead, we should use the most recent version id known to a client as an indication of the client's view of the world at the time as symbol list entry is written. That way, we can identify and correct symbol list entries that refer to conflicting writes.

Notebooks

  • Added AWS blockchain notebook (#1040)
  • Added AWS blockchain to docs index (#1043)
  • Add Snapshot + Equity Notebooks (#1071)
Uncategorized
  • 744 extend real storage tests to run with large lifelike data and all api methods (#989)
  • Update BSL table for 4.1 (#1023)
  • Centralise the pytest marks (#1024)
  • Document the S3 backends that we have tested against and "un-beta" LMDB on Windows (#1016)
  • Sparse aggregation (#1007)
  • Docs versioning (#1008)
  • set-default after deploy so that 'latest' alias can be created first (#1029)
  • build: Remove old Cython configuration and adaptation (#1028)
  • Docs workflow fixes. (#1030)
  • build: Replace emilib with robin_hood (#995)
  • hot-fix: Use previous build of libmongocxx to avoid missing symbols (#1050)
  • Remove unused C++ Wangle dep (#1047)
  • Bugfix 1055: Unflake test_read_batch_time_stamp (#1058)
  • Change tag format in docs build (#1062)
  • Snapshot notebook typos (#1088)
  • Update vcpkg dep (#1091)
  • Enhancement/732/processing unit ecs model (#960)
  • Add xfail to flaky tests (#1087)
  • Add a mechanism to extend storage transaction lifetime to lifetime of… (#975)
  • Tweak release docs (#1019)
  • Change tmpdir to tmp_path (#1093)
  • Remove unnecessary xfails (#1097)
  • Add checks to see whether we should be validating version entries during compaction (#1099)
  • 941 self hosted runners for ci (#997)
  • Make dependency of pymongo optional in running (#1027)
  • Add preliminary change for slowdown error test (#1064)
  • Add mutex to ensure only single thread at pybind->c++ layer (#973)
  • Issue #1017 Only warn if the "base" LMDB env is opened twice (#1022)
  • Fix run-cmake action (#1034)
  • Add a fallback to free GH runners, when there is a problem with the self-hosted ones (#1063)
  • fix: Empty column handling improvements (#1049)
  • Pin all our Github actions deps (#1090)

The wheels are on Pypi. Below are for debugging:

v4.0.3

23 Nov 16:17

Choose a tag to compare

This is a patch release to version 4.0 that backports some changes from master.

🚀 Features

  • Add preliminary change for slowdown error test (#1070)

🐛 Fixes

  • Empty column handling improvements (#1079)
  • Use previous build of libmongocxx to avoid missing symbols (#1083)
  • Remove docs publish step so we don't overwrite the docs (#1025)
  • Remove Black and pre-commit setup (#1085)

The wheels are on Pypi. Below are for debugging:

v4.1.0

01 Nov 17:54

Choose a tag to compare

⭐ New APIs

In-memory Backend

You can now open ArcticDB with an in-memory backend,

from arcticdb import Arctic
ac = Arctic("mem://")
ac.create_library("test")
assert ac.list_libraries() == ["test"]
# Create libraries as normal. Each `Arctic` object manages its own in-memory storage, so the lifetime
# of your libraries and data is the same as the lifetime of the `Arctic` instance that owns them.

ac2 = Arctic("mem://")
assert ac2.list_libraries() == []  # ac2 is backed by different memory to ac so the "test" library is not returned

Query Builder

We now support a new "count" aggregator. You can invoke it with:

q = QueryBuilder()
q = q.groupby("grouping_column").agg({"a": "count"})

⚠️ Breaking and API Changes

LMDB Backend

This release includes a fix for issue #850 : Ensure that LMDB libraries are readable after being moved to a different location. The fix means that LMDB libraries created with arcticdb>=4.1.0 will not be readable by older clients and those clients must update.

This is because the fix stops us from serializing the LMDB library path (instead we always prefer the one in the Arctic URI), but older clients still expect to see the LMDB path serialized. Older clients reading a new LMDB library will in fact ignore the path passed in to the Arctic constructor and instead read the current working directory.

When you exceed the LMDB map size, we now raise a custom exception arcticdb.exceptions.LmdbMapFullError that explains how to re-open LMDB with a larger map size, whereas previously we raised a less helpful arcticdb.exceptions.InternalException.

🚀 Features

  • Support count aggregator with groupby (#948)
  • Warning for LMDB when two Arctic instances open over the same storage (#1000)
  • Small LMDB Fixes: 2GiB map size for Windows, Validation before delete (#918)
  • Custom exception when LMDB map is full (#1006)
  • Memory backed API (#860)
  • Allow ampersand in symbol names (#900)
  • Add querybuilder notebook demo into the docs (#875)
  • Extended testing against real cloud storages (#789)
  • ASV Benchmarking published here (#962 #970)
  • Preparatory work for RocksDB backend (#945)

🐛 Fixes

  • Fix LZ4 decoding error issues that occurred with a mix of empty and non-empty columns (#964)
  • Performance improvement for read_batch when called with many symbols (4-5x improvement) (#870)
  • Convert semimap to switch which has resolved some segmentation fault issues (#912)
  • Upgrade cUrl to 8.4.0 (#977)
  • Cache open libraries in the LibraryManager (#990)
  • Enhancement 914: Improve error messaging when string column encoding fails due to the presence of a non-string object (#933)
  • Fix storage lock mutex implementation (#966)
  • Make azure sdk stick to winhttp if possible (#851)
  • Extra update checks (#539)
  • maint: Indicate the non-support of PyArrow (#882)
  • Add pymongo to the list of install dependencies (#891)
  • conda-build: Run python tests for macos-latest (#873)
  • fix: Change comparison in test_hypothesis_{sum,mean}_agg (#931)
Uncategorized
  • Fixes IFNDR issue due to mismatching inline/non-inline functions depending on the translation unit (fixes #943) (#949)
  • docs: Post 4.0.0 release documentation (#967)
  • Remove Black and pre-commit setup (#972)
  • Improve string writing performance (#969)
  • Skip docs in build.yml (#984)
  • Move api docs from sphinx to mkdocs (#897)
  • Document support for Mac on intel (#982)
  • Update PAT for publish to master (#988)
  • Fix ccache's non-existence exiting the workflow (#996)
  • Refactor get_descriptions lib methods to be more consistent (#994)
  • Fail docs build on sphix failure (#883)
  • docs: Better document publishing release candidates on conda-forge (#901)
  • conda-build: Unpin some dependencies (#888)
  • docs: Reword conda-forge section mentioning libevent-2.1.10 (#905)
  • Update readme to reflect supported/beta status of Windows PyPi/MacOS conda-forge builds (#907)
  • Skip flaky Mac test (#924)
  • docs: Improve section "Building using mamba and conda-forge" (#917)
  • fix interface in ManualClockVersionStore getter (#925)
  • docs: Add high-level documentation of abstractions (#628)
  • Update storage compatibility (#916)
  • Update copyright notice (#939)
  • Use new get_library argument in ArcticDB_demo_lmdb.ipynb (#930)

The wheels are on Pypi. Below are for debugging:

v4.0.2

02 Nov 12:51

Choose a tag to compare

This is a patch release to version 4.0 that backports some changes from master.

🐛 Fixes

Bugfix for a deadlock issue when using Python multithreading and batch_read (#1021)


The wheels are on Pypi. Below are for debugging:

v4.0.1

02 Nov 16:16

Choose a tag to compare

This is a patch release to version 4.0 that backports some fixes from master.

🚀 Features

  • Allow ampersand in symbol names (#952)

🐛 Fixes

  • Backport minor fixes to 4.0.x (#954), not user facing

The wheels are on Pypi. Below are for debugging:

v1.6.2

04 Oct 10:51

Choose a tag to compare

This release is a patch release, backporting bug-fixes to v1.6.1

🐛 Fixes

  • Fixed a bug in key data retrieval, which could lead to incorrect behavior and segmentation faults (#912 )

The wheels are on Pypi. Below are for debugging:

v4.0.0

27 Sep 07:37

Choose a tag to compare

⚠️ API changes

For Library.get_description_batch, Library.read_metadata_batch and Library.write_batch, a DataError object will now be returned in the position in the list returned corresponding to the symbol/version pair there was an issue reading/writing. Note this may require code changes to support the new error handling behaviour - as a result it is being considered a breaking change as described above.

  • get description batch method: method rationalisation (#814)
  • read metadata batch method: method rationalisation (#814)
  • Write batch method: method rationalisation (#814)

🚀 Features

  • Pandas 2.0 support (#343) (#540) (#804) (#846)
    • Modifications have been made to the normalisation and denormalisation processes for pandas.Series and pandas.DataFrame to match the new defaults in pandas 2.0.
    • Handling of 0-row DataFrames for improved correctness and usability.
    • Empty Column are now properly handled, especially regarding the change of defaults for empty collections for Pandas 1.X and Pandas 2.X.
    • Extended the tests to reflect changes in behaviour due to pandas 2.0's new defaults.
    • Please note, PyArrow remains unsupported in this integration.
  • conda-build: Bring support for Azure Blob Storage (#840) (#854) (#853) (#857)
  • Add uri support for mongodb (#761)
  • Code coverage analysis and report workflow (#783) (#784)
  • Add documentation with doxygen (#736)

🐛 Fixes

  • Update support status: Pandas DataFrame and Series backed by PyArrow are not supported (#882)
  • Added pymongo to the list of installation dependencies (#891)
  • Resolved dependency issues for the mergeability check step (#822)
  • Fixed issue where AWS authentication wasn't used, even though the option was enabled (#843)
  • Resolved issue of early read termination in 'has_symbol' (#836)
  • Test: Ensured that QueryBuilder is pickleable with all possible clauses (#861)
  • Fixed issue with the 'latest_only' option for the 'list_versions' method (#839)
  • Added the ability for users to specify LMDB map size in the Arctic URI (#811)
  • Fixed issue 767: Segfault in batch write with string columns has been resolved (#827)(874)
  • Renamed ArcticNativeNotYetImplemented in a way that maintains backward compatibility, to fix issue #774 (#821)
  • Modified Azure SDK to favour winhttp over libcurl on Windows for improved SSL verification (#851)
  • Updated the maximum batch size for Azure operations (#878)
Uncategorized
  • Maintenance: Added a minimal Security Policy (#823)
  • Fixed documentation following an exception renaming (#824)
  • Resolved issues in the publish step (#825)
  • Added documentation for setting LMDB map size (#826)
  • Incorporated notebooks into the documentation (#844)
  • Maintenance: Removed unused definitions from protocol buffers (#856)
  • Enhanced error handling to fail document build on Sphinx errors (#883)
  • Maintenance: Replaced deprecated ZSTD_getDecompressedSize function (#855)
  • Refactored non-functional library manager, addressing Issue #812 (#828)
  • Made minor improvements to the documentation (#841)
  • Improved handling of the deprecated S3 option "force_uri_lib_config" (#833)
  • Corrected the release date of version 3.0.0 in README.md (#858)

The wheels are on Pypi. Below are for debugging:

v3.0.0

13 Sep 13:40

Choose a tag to compare

🔒 Security + Forwards Incompatible Change

  • S3 and Azure: Do not save sensitive or ephemeral config in the config library (#803)

This fixes a security issue with ArcticDB where creds were kept in storage for:

  • Azure
  • AWS if the access keys are supplied in the URI instead of aws_auth=True.

These instructions explain how to upgrade your storage to remove the credentials. See also issue #802 .

Compatibility matrix

Storage Library created with < v3.
Library accessed with >= v3.
Library created with or upgraded to >= v3.
Library accessed with < v3.
S3 with aws_auth=True Continues to work Raises InternalException: E_INVALID_ARGUMENT S3 Endpoint must be specified.
Will work again if access=_RBAC_&secret=_RBAC_&force_uri_lib_config=true is in the URI passed to Arctic()
S3 with access and secret.

Will now use the creds passed to Arctic(), but should continue to work if the creds are sufficient.

A future release might print a warning with instructions to upgrade.

Raises InternalException: E_INVALID_ARGUMENT S3 Endpoint must be specified.
Will work if force_uri_lib_config=true is in the URI passed to Arctic()
Azure Operations on the library will fail with various internal error messages

Full details:

What's happened?

Whilst reviewing our codebase we discovered a way that access-keys for ArcticDB storage backends could be saved into the storage in clear text.

This behavior was by design, but there is a chance that this has happened for some third-party users without being obvious.
This depends on the backend used and how you connect to the storage.

What is the exact scope of the issue?

If you created an ArcticDB library, either with an S3 bucket and passed the access-keys as part of the URI, or with Azure Blob Storage with the access-keys as part of the connection-string, then the credentials were saved into the storage account as part of the ArcticDB library config.
If you then shared that storage account with others using different roles or access-keys, then those users would in theory have been able to access the credentials used to create the library.

What have you done to address this?

We've updated ArcticDB so that all new libraries do not do this, even if the credentials are passed in with the URI/connection-string.
We've prepared a storage-update script which you can run to see if the credentials are there, and then remove them if they are.

What is the impact if I am affected?

If you have shared that storage account with anyone else using different roles/credentials, then your original credentials have also been accessible to those users.
It's possible those users recorded the credentials, and because those credentials must have had write-access to create the library, they could have made changes to the data or otherwise used those credentials.

What can I do to check if I'm affected?

See these instructions.

If needed you can check on previous versions of ArcticDB using the code referenced on github:
#802 (comment)

What should I do if I am affected?

Follow these instructions.

This change is not forwards compatible, so users on earlier clients may need to upgrade:

  • S3 libraries created with 3.0.0 will not be readable by earlier ArcticDB versions unless force_uri_lib_config=True in their connection string.
  • Azure libraries created with 3.0.0 will not be readable by earlier ArcticDB versions.

Then,

  • Rotate your credentials.
  • If you've shared access to that storage account then please also check the integrity of your data and anything else accessible via those credentials.

What was the cause?

Previous use cases of ArcticDB had split storage accounts. One account was used to configure libraries and other accounts held the data for those libraries. Credentials to read those data-libraries were then stored into the configuration account and passed to users as needed for access to the data. This code was not caught during our review, and so was not disabled or removed when we made ArcticDB available to others. When we added Azure Blob storage support subsequently, the side-effect of saving anything in the connection-string to storage was not anticipated.

Having reviewed the codebase again we are confident that this was the only way that credentials could be saved into storage using our public API.

We plan to continue supporting our split storage solution for some users, but it should always be very clear when access-keys are being stored and what the risks are for that.

🚀 Features

  • Conda-forge build now supports Azure Blob Storage
  • Enhancement/728/make iclause responsible for processing structure (#752)
  • Add more info in the CI readme; Prepare var for real storage tests (#663)
  • Enhancement 702: Add option to create library if it does not exist when calling get_library (#775)
  • Enhancement 714: Expose library methods to list symbols with staged data, and to delete staged data (#778)
  • Enhancement 737: Support empty-type columns in QueryBuilder operations (#794)
  • conda-build: Adapt C++ test suite for Linux (#713)

🐛 Fixes

  • conda-build: Use default compilers for macOS (#662)
  • Bugfix/nativeversionstore write metadata batch should never return dataerror objects (#782)
  • Add handling of unspecified ca path in azure uri (#771)
  • Add dep. on packaging (#795)
  • Fix get_num_rows for NativeVersionStore (#800)
Uncategorized
  • First version of AWS S3 setup guide (#708)
  • fix(docs): central docs URL from API docs homepage (#755)
  • Add none type (#646)
  • Azure getting started guide (#749)
  • Docs fixes (#762)
  • Decouple storage headers from implementations & storage.hpp (#763)
  • Bugfix 554: Remove unused argument from write_batch (#769)
  • Partially revert #763 for consistency (#766)
  • Make it clear to not commit directly to ArcticDB feedstock but use PRs instead (#741)
  • maint: pandas 2.0 forward compatible changes (#540)
  • test: Test the absence of implace modification on datetime64 normalization for pandas 2.0 (#801)
  • Update README.md (#799)
  • test: Remove test for fallback to pickle (#805)
  • Docs - update release number (#816)
  • conda-build: Pin cmake (#815)
  • Update releasing.md (#817)
  • ArcticDB 3.0.0 update BSL table (#820)

v2.0.0

29 Aug 13:07

Choose a tag to compare

This version contains breaking changes to the ArcticDB API. As per the SemVer versioning scheme, we have bumped the major version.

⚠️ API changes

  • Write batch metadata method: method rationalisation (#476)

  • Append batch metadata method: method rationalisation (#548)

For Library.write_metadata_batch and Library.append_batch, a DataError object will now be returned in the position in the list returned corresponding to the symbol/version pair there was an issue writing. Note this may require code changes to support the new error handling behaviour - as a result it is being considered a breaking change as described above.

See the docs for read_batch, which uses the same exception return mechanism.

  • (Minor) The internal protobuf field arcticc.pb2.descriptors_pb2.TypeDescriptor.MICROS_UTC has changed name to NANOSECONDS_UTC. This is only visible via the Arctic API as a string via get_description & get_description_batch on dtype attributes, so external users will only be affected by this if you are parsing these strings.

🚀 Features

  • Projections, group-by, and aggregations added to the processing framework (#712)
  • Reduce memory footprint of head and tail methods (#583)
  • Per symbol parallelisation for write batch metadata method (#476)
    • This can result in significant performance improvements when using this method over many symbols.
  • Per symbol parallelisation for append batch metadata method (#548)
    • This can result in significant performance improvements when using this method over many symbols.

🐛 Fixes

  • Ensure content hash is copied during restore version + fixing timestamp-uniqueness-related flaky tests (#600)
  • Restrict supported string types to type equality rather than has isinstance (#704)
  • Incorrect initialisation of LoadParameter::load_from_time_ (#697)
  • Ensure compact_incomplete and recursive normalization obey the library setting for pruning (#705)
Uncategorized
  • Update release process to detail the process for pre-releases (#688)
  • Unify release and pre-release hotfixing (#725)
  • Skip test_diff_long_stream_descriptor_mismatch on MacOS (#693)
  • maint: Remove VariantStorage (#695)
  • maint: Rename datetime64[ns]-related fields and datatypes (#592)
  • run C++ tests for conda build / ci (#486)

The wheels are on Pypi. Below are for debugging:

v1.6.1

09 Aug 09:08

Choose a tag to compare

🐛 Fixes

  • Add a more strict check for chars in the symbol names (#627)
  • Fix as_of with timestamp reading entire version chain rather than just reading up-to the required version (#596)
    • as_of=<timestamp> reads will be significantly faster for symbols with many versions
  • Fix to ensure batch prune previous methods clean up index and data keys as well as version keys (#623)
  • Only log ErrorCategory::INTERNAL errors (#676)
  • Enable importing DataError from arcticdb (#657)
  • Refactor underlying segment write scheduling (#532)
    • This can result in a significant performance improvement for large writes
Uncategorized
  • Remove SemVer version validation as it doesnt support version strings such as 1.6.1rc0 (#678)
  • Docs 672: Add docstring for DataError class (#686)
  • Black output diff rather than just erroring in build (#685)
  • Correct docs for read_batch return type (#664)

The wheels are on Pypi. Below are for debugging: