-
Notifications
You must be signed in to change notification settings - Fork 132
Installation tests v3 #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Installation tests v3 #2330
Conversation
cache-environment: true | ||
post-cleanup: 'all' | ||
|
||
- name: Add arcticdb from conda-forge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to run these using the Linux Conda build too, not just Mac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I havethought about it too and would propose to add as one of the combination (currently we have 3 linuxes + pypi, we can make one of linuxes not pypi bu conda)
def lmdb_storage(tmp_path) -> Generator[LmdbStorageFixture, None, None]: | ||
with LmdbStorageFixture(tmp_path) as f: | ||
yield f | ||
def lmdb_storage(request, tmp_path) -> Generator[LmdbStorageFixture, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be clearer to have a dedicated test suite for these installability tests because they will have to keep working with possibly very old ArcticDB versions, so will in a sense be "frozen" - whereas the core tests will change more over time.
I also don't think we should add this complexity to the core lmdb_storage
fixture - a dedicated single purpose fixture would be clearer to me.
I understand why you've done it like this, it is quite appealing to just reuse existing tests, and the way you've done it is really neat, I just think it is best to be painfully obvious here (especially for the comprehensibility for new joiners etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have testsed so far with 4.4.7 version (lowest) and there were no problems. But I do understand what you mean. An approach with what you suggest was the original one - #2316
The idea there was to reuse as little as possible, and therefore only shared_tests.py was the thing that both current tests and new tests shared, and the idea was once a test started to diverge to be no more shared but have two versions - one for the original and one for the installation tests (it was ok since the tests files/suites were different )
Regarding the open questions in the description,
Finally, it will be important to have some brief docs explaining this setup (and why we have it) for the benefit of future developers. |
I think an important part of this will be having a way to constrain the dependencies used by earlier ArcticDB versions - we don't want tests to fail on ArcticDB vPREHISTORIC when numpy 3 is released, for example |
Do we think it's valuable to do this on PyPi (where we bundle all our C++ deps) or perhaps only doing this on Conda would suffice? |
|
name: Run Installation Tests v3 | ||
|
||
on: | ||
push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are planning to run those tests on branch push. I think it's much better to run them periodically. We rarely need to update older branches, but we still need to keep the older versions workable.
steps: | ||
|
||
- name: Checkout code | ||
uses: actions/checkout@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to check out? Is it for checking out tests? If we rely on latest tests with older versions of arcticdb, it might not have the up to date functionality.
#### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Successful execution 5.2.6: https://github.com/man-group/ArcticDB/actions/runs/14641126753/job/41083591802 5.1.2: https://github.com/man-group/ArcticDB/actions/runs/14637571996 4.5.1: https://github.com/man-group/ArcticDB/actions/runs/14639124835/job/41077126258 1.6.2: https://github.com/man-group/ArcticDB/actions/runs/14701046721/job/41250511273 The PR contains workflow definition to execute tests on installed arcticdb it is combination of approaches: #2330 #2316 Installation tests are now in separate folder (python/installation_tests) not part of tests. They have their own fixtures, making them independent from rest of code base The tests are direct copy from originals with one modified to user ver 2 API. Otherwise now if there are changes in API each test in installation set can be addapted. As tests run very fast no need to use simulators, instead directly using S3 real storage The tests are executed by a workflow. Currently each test is executed against LMDB and real S3. The moto simulated version is not available in this moment due to tight coupling with protobufs which differ for ach version as well as tight coupling with whole existing test code. The workflow have 2 triggers: - manual trigger - allowing tests to be executed manually on demand - on schedule - the schedule execution is overnight. Each arcticdb version tests are executed within 1hr difference from the other. Thats is due to fact that executing all at once is likely to generate errors with real storages #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
commit facc33bead487490322ba9cc973ed86dc9b5c4c6 Merge: bc68ed467 85d51e3b7 Author: Vasil Danielov Pashov <[email protected]> Date: Tue May 27 20:15:59 2025 +0300 Merge branch 'master' into vasil.pashov/coverity-test-existing-code-with-errors commit bc68ed467842b510bbd7001175cc8eecefc29e1c Merge: e68ec0146 91a076cc2 Author: Vasil Pashov <[email protected]> Date: Tue May 27 20:12:57 2025 +0300 Merge branch 'master' into vasil.pashov/coverity-test-existing-file commit 85d51e3b748982dc9121026a4dfcbd9f5a1dc2fb Author: Alex Owens <[email protected]> Date: Tue May 27 10:54:08 2025 +0100 Bugfix 9209057536: Allow concatenation of uint64 columns with int* columns (#2365) #### Reference Issues/PRs Fixes [9209057536](https://man312219.monday.com/boards/7852509418/pulses/9209057536) #### What does this implement or fix? Allows concatenating columns of type uint64 with columns of type int* commit 91a076cc267caf549ff38cb532dd76c5e4e168ba Author: Alex Owens <[email protected]> Date: Fri May 23 17:46:47 2025 +0100 Enhancement 7992967434: filters and projections ternary operator (#2103) #### Reference Issues/PRs Implements [7992967434](https://man312219.monday.com/boards/7852509418/pulses/7992967434) #### What does this implement or fix? Implements a ternary operator equivalent to `numpy.where`, primarily for projecting new columns based on some condition, although it can also be used for filtering. Semantically the same as `left if condition else right`, although this Pythonic syntax cannot be made to work due to limitations of the language. #### Any other comments? See `test_ternary.py` for a plethora of examples and the expected behaviour in each case. Example benchmark output with annotations below. The first parameter to all benchmarks is the number of rows (100k for all of them right now), so the single-threaded per-row time can be calculated by dividing by 100,000. e.g. projecting a new column of 100k rows by choosing from 2 dense columns (likely a common use case) takes 424us, or just over 4ns per row. Other parameters are explained for each individual benchmark. ``` Run on (20 X 2918.4 MHz CPU s) CPU Caches: L1 Data 48 KiB (x10) L1 Instruction 32 KiB (x10) L2 Unified 1280 KiB (x10) L3 Unified 24576 KiB (x1) Load Average: 4.23, 6.56, 6.73 -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------------- BM_ternary_bitset_bitset/100000 13.1 us 13.1 us 58099 # Second arg is whether the boolean argument is true or false, third is whether the arguments are swapped BM_ternary_bitset_bool/100000/1/1 2.00 us 2.00 us 363634 BM_ternary_bitset_bool/100000/1/0 7.43 us 7.43 us 101700 BM_ternary_bitset_bool/100000/0/1 7.28 us 7.28 us 88907 BM_ternary_bitset_bool/100000/0/0 2.45 us 2.45 us 307832 BM_ternary_numeric_dense_col_dense_col/100000 424 us 424 us 1276 BM_ternary_numeric_sparse_col_sparse_col/100000 3548 us 3548 us 185 # Second arg is whether the arguments are swapped BM_ternary_numeric_dense_col_sparse_col/100000/1 2555 us 2555 us 258 BM_ternary_numeric_dense_col_sparse_col/100000/0 2800 us 2800 us 262 # Second arg is the number of unique strings in each string column, third is whether the columns have the same string pool or not BM_ternary_string_dense_col_dense_col/100000/100000/1 438 us 438 us 1534 BM_ternary_string_dense_col_dense_col/100000/100000/0 16257 us 16258 us 43 BM_ternary_string_dense_col_dense_col/100000/2/1 441 us 441 us 1603 BM_ternary_string_dense_col_dense_col/100000/2/0 4219 us 4219 us 186 BM_ternary_string_sparse_col_sparse_col/100000/100000/1 3854 us 3854 us 191 BM_ternary_string_sparse_col_sparse_col/100000/100000/0 10753 us 10754 us 67 BM_ternary_string_sparse_col_sparse_col/100000/2/1 3655 us 3655 us 183 BM_ternary_string_sparse_col_sparse_col/100000/2/0 4592 us 4592 us 123 BM_ternary_string_dense_col_sparse_col/100000/100000/1 2957 us 2957 us 236 BM_ternary_string_dense_col_sparse_col/100000/100000/0 13980 us 13980 us 50 BM_ternary_string_dense_col_sparse_col/100000/2/1 2967 us 2966 us 237 BM_ternary_string_dense_col_sparse_col/100000/2/0 5179 us 5179 us 160 # Second arg is whether the arguments are swapped BM_ternary_numeric_dense_col_val/100000/1 360 us 359 us 1871 BM_ternary_numeric_dense_col_val/100000/0 388 us 388 us 1692 BM_ternary_numeric_sparse_col_val/100000/1 2244 us 2244 us 292 BM_ternary_numeric_sparse_col_val/100000/0 2385 us 2385 us 283 # Second arg is whether the arguments are swapped, third is the number of unique strings in the column BM_ternary_string_dense_col_val/100000/1/100000 8259 us 8258 us 82 BM_ternary_string_dense_col_val/100000/0/100000 7683 us 7683 us 93 BM_ternary_string_dense_col_val/100000/1/2 2578 us 2578 us 261 BM_ternary_string_dense_col_val/100000/0/2 2385 us 2385 us 297 BM_ternary_string_sparse_col_val/100000/1/100000 6302 us 6302 us 129 BM_ternary_string_sparse_col_val/100000/0/100000 5792 us 5792 us 115 BM_ternary_string_sparse_col_val/100000/1/2 2903 us 2903 us 249 BM_ternary_string_sparse_col_val/100000/0/2 3095 us 3095 us 232 # Second arg is whether the arguments are swapped BM_ternary_numeric_dense_col_empty/100000/1 1269 us 1269 us 584 BM_ternary_numeric_dense_col_empty/100000/0 1354 us 1354 us 512 BM_ternary_numeric_sparse_col_empty/100000/1 1363 us 1363 us 572 BM_ternary_numeric_sparse_col_empty/100000/0 1374 us 1374 us 484 # Second arg is whether the arguments are swapped, third is the number of unique strings in the column BM_ternary_string_dense_col_empty/100000/1/100000 1217 us 1217 us 587 BM_ternary_string_dense_col_empty/100000/0/100000 1343 us 1343 us 577 BM_ternary_string_dense_col_empty/100000/1/2 1287 us 1287 us 574 BM_ternary_string_dense_col_empty/100000/0/2 1363 us 1363 us 518 BM_ternary_string_sparse_col_empty/100000/1/100000 1413 us 1413 us 524 BM_ternary_string_sparse_col_empty/100000/0/100000 1343 us 1343 us 517 BM_ternary_string_sparse_col_empty/100000/1/2 1293 us 1293 us 540 BM_ternary_string_sparse_col_empty/100000/0/2 1235 us 1235 us 480 BM_ternary_numeric_val_val/100000 368 us 368 us 2039 BM_ternary_string_val_val/100000 376 us 376 us 1862 # Second arg is whether the arguments are swapped BM_ternary_numeric_val_empty/100000/1 40.7 us 40.7 us 16491 BM_ternary_numeric_val_empty/100000/0 36.7 us 36.7 us 17836 BM_ternary_string_val_empty/100000/1 40.8 us 40.8 us 17892 BM_ternary_string_val_empty/100000/0 58.2 us 58.2 us 13825 # Second arg is whether the left argument is true or false, third is whether the right argument is true or false BM_ternary_bool_bool/100000/1/1 1.43 us 1.43 us 518204 BM_ternary_bool_bool/100000/1/0 1.99 us 1.99 us 378598 BM_ternary_bool_bool/100000/0/1 4.52 us 4.52 us 157505 BM_ternary_bool_bool/100000/0/0 0.020 us 0.020 us 37060921 ``` commit 3c059f4d4030dc73594f277d8754918c698a2969 Author: Phoebus Mak <[email protected]> Date: Thu May 22 09:50:29 2025 +0100 Fix gcp lib unreachable after making it read only (#2349) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> https://man312219.monday.com/boards/7852509418/pulses/8985074856 #### What does this implement or fix? `create_store_from_lib_config` took protobuf setting only. GCP setting is stored natively only, unlike other storages setting. So when new store is created with the above function, gcp settings have not been passed to the new store. Therefore the SDK will fallback to default but incorrect setting and cause errors. S3 and GCPXML native settings are given default value to avoid uninitiailzied value being used in the test #### Any other comments? Test in the CI: https://github.com/man-group/ArcticDB/actions/runs/15164054821/job/42638155043 ``` test_symbol_list.py::test_symbol_list_read_only_compaction_needed[real_gcp_store_factory-True] [gw0] [ 95%] PASSED tests/integration/arcticdb/version_store/test_symbol_list.py::test_symbol_list_read_only_compaction_needed[real_gcp_store_factory-True] test_symbol_list.py::test_symbol_list_read_only_compaction_needed[real_gcp_store_factory-False] [gw0] [ 95%] PASSED tests/integration/arcticdb/version_store/test_symbol_list.py::test_symbol_list_read_only_compaction_needed[real_gcp_store_factory-False] ``` (Other unrelated tests failed in the flaky real storage CI) #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 9d98a4436e376fa1623af92f23153cde5b68a68b Author: Alex Owens <[email protected]> Date: Wed May 21 18:03:28 2025 +0100 Fix multiindex series (#2363) #### What does this implement or fix? Fixes roundtripping of multiindexed Series with timestamps as the first level and strings as the second level. Broken by #2142 --------- Co-authored-by: Alex Owens <[email protected]> commit c3c7c2ac5d7d98d16305e6914713f03454d30a57 Author: Alex Owens <[email protected]> Date: Wed May 21 16:42:34 2025 +0100 Docs 8975554293: Add concat demo notebook (#2361) #### Reference Issues/PRs Completes [8975554293](https://man312219.monday.com/boards/7852509418/pulses/8975554293) #### What does this implement or fix? Adds a notebook demonstrating the new `concat` functionality added in https://github.com/man-group/ArcticDB/pull/2142 --------- Co-authored-by: Alex Owens <[email protected]> commit 17ea0e49deba0a3a1b8e6267e9516b14ea34b3ef Author: grusev <[email protected]> Date: Wed May 21 18:31:23 2025 +0300 Update installation_tests.yml with 5.3 and 5.4 final versions (#2362) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? #### Any other comments? Moved 5.2.6 to different timeslot to eliminate the possibility about failures being because timeslot. Although a manual execution shows this problem with 5.2.6. is most probably persisting https://github.com/man-group/ArcticDB/actions/runs/15139549472/job/42559651096) Added: 5.3.4 https://github.com/man-group/ArcticDB/actions/runs/15133764164/ 5.4.1 https://github.com/man-group/ArcticDB/actions/runs/15133923361 #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit e68ec014683d00f095e4efbe5d72b81b7509299d Author: Vasil Pashov <[email protected]> Date: Wed May 21 11:38:45 2025 +0300 Temporary disable tests commit e3afff2115d4f0038d13a5327a8c7b7779552a99 Merge: bdbc17028 424cd56e2 Author: Vasil Pashov <[email protected]> Date: Wed May 21 11:17:22 2025 +0300 Merge branch 'master' into vasil.pashov/coverity-test-existing-file commit 424cd56e295afafd64444420b92fcf89a82dd1ea Author: grusev <[email protected]> Date: Tue May 20 11:09:42 2025 +0300 Schedule S3 tests and fix STS to run only against AWS S3 (#2356) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Shedule for now to run twice a week Contains also couple of other fixes of the workflow: - seeding tests were not executed previously due to change in workflow parameter from boolean to choice for GCP tests. Now seeding tests are executed. - STS role creation was executed for GCP tests which was unnecessary. Now it gets executed only with AWS S3 - persistent tests cleaning had a problem with the context and resulted in crash not being able to load storage_tests.py. This test is fixed now to allow proper loading of mark.py in defferent contexts Results: https://github.com/man-group/ArcticDB/actions/runs/15061574677/job/42337724260 (NOTE: the failures in the above run are because this PR: https://github.com/man-group/ArcticDB/pull/2353 is not part of current one. Once it gets merge S3 tests will run without problems) #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> commit a158b0c2e684c9389691744c001192ce94ddc79d Author: Alex Owens <[email protected]> Date: Mon May 19 13:28:51 2025 +0100 Bugfix 9123099670: fix resampling of old updated data (#2351) #### Reference Issues/PRs Fixes [9123099670](https://man312219.monday.com/boards/7852509418/views/168855452/pulses/9123099670) #### What does this implement or fix? Fixes three separate resampling bugs: 1. Old versions of `update` (changed sometime between `4.1.0` and `4.4.0`, I haven't pinned down exactly where) had a behaviour in which the `end_index` value in the data key of the segment overlapping with the start of the date range provided to the `update` call was set to the first value of the date range in the `update` call. For all other modification methods, this is set to 1 nanosecond larger than the last index value in the contained segment. Resampling assumed this to be the case, and had an assertion verifying it. Relaxing this assertion is sufficient to fix the issue. 2. Providing a `date_range` argument with a resample where the provided date range did not overlap with the timerange covered by the index of the symbol led to trying to reserve a vector with a negative size. This now correctly returns an empty result. 3. Previously, checks that a symbol being resampled had a timestamp index occurred after some operations which also require this to be true, which could lead to the same vector reserve issue above. It is now checked in advance, and a suitable exception raised. commit 9edc74a89102b4ab66fbd7911a31322425dfcacc Author: grusev <[email protected]> Date: Mon May 19 12:54:07 2025 +0300 nfs backed tests for v1 API (#2350) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? arctic_* fixtures or v2 API is already covered with nfs backed s3 tests. What is needed now is to add also tests for v1 API fixtures. New Fixtures: nfs_backed_s3_store_factory nfs_backed_s3_version_store_v1 nfs_backed_s3_version_store_v2 nfs_backed_s3_version_store_dynamic_schema_v1 nfs_backed_s3_version_store_dynamic_schema_v2 nfs_backed_s3_version_store Added to: object_store_factory s3_store_factory -> nfs_backed_s3_store_factory object_and_mem_and_lmdb_version_store s3_version_store_v1 -> nfs_backed_s3_version_store_v1 s3_version_store_v2 -> nfs_backed_s3_version_store_v2 object_and_mem_and_lmdb_version_store_dynamic_schema s3_version_store_dynamic_schema_v1 -> nfs_backed_s3_version_store_dynamic_schema_v1 s3_version_store_dynamic_schema_v2 -> nfs_backed_s3_version_store_dynamic_schema_v2 #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> commit 67d2bbe530f96a0aa5412f479e123da480ba2d99 Author: Alex Owens <[email protected]> Date: Fri May 16 15:20:37 2025 +0100 Enhancement 8277989680: symbol concatenation poc (#2142) #### Reference Issues/PRs 8277989680 #### What does this implement or fix? Implements symbol concatenation. Inner and outer joins over columns both supported. Expected usage: ``` # Read requests can contain usual as_of, date_range, columns, etc arguments lazy_dfs = lib.read_batch([read_request_1, read_request_2, ...]) # Potentially apply some processing to all or individual constituent lazy dataframes here, that will be applied before the join lazy_dfs = lazy_dfs[lazy_dfs["col"].notnull()] # Join here lazy_df = adb.concat(lazy_dfs) # Perform more processing if desired lazy_df = lazy_df.resample("15min").agg({"col": "mean"}) # Collect result res = lazy_df.collect() # res contains a list of VersionedItems from the consituent symbols that went into the join with data=None, and a data member with the joined Series/DataFrame ``` See `test_symbol_concatenation.py` for thorough examples of how the API works. For outer joins, if a column is not present in one of the input symbols, then the same type-specific behaviour as used for dynamic schema is used to backfill the missing values. Not all symbols can be concatenated together. The following will throw exceptions if attempted to be concatenated: - a Series with a DataFrame - Different index types, including multiindexes with different numbers of levels - Incompatible column types. e.g. if `col` has type `INT64` in one symbol, and is a string column in another symbol. this only applies if the column would be in the result, which is always the case for all columns with an outer join, but may not always be for inner joins. Where possible, the implementation is permissive with what can be joined with an output as sensible as possible: - Joining two or more Series with different names that are otherwise compatible will produce a Series with no name - Joining two or more timeseries where the indexes have different names will produce a timeseries with an unnamed index - Joining two or more timeseries where the indexes have different timezones will produce a timeseries with a UTC index - Joining two or more multiindexed Series/DataFrames where the levels have compatible types but different names will produce a multiindexed Series/DataFrame with unnamed levels where they differed between some of the inputs. - Joining two or more Series/DataFrames that all have `RangeIndex`. If the index `step` does not match between all of the inputs, then the output will have a `RangeIndex` with `start=0` and `step=1`. **This is different behaviour to Pandas, which converts to an Int64 index in this case. For this reason, a warning is logged when this happens.** The only known major limitation is that all of the symbols being joined together (after any pre-join processing) must fit into memory. Relaxing this constraint would require much more sophisticated query planning than we currently support, in which all of the clauses both for individual symbols pre-join, the join, and any post-join clauses, are all taken into account when scheduling both IO and individual processing tasks. commit c1c7a8cff3193dcf4aefee268cd3feea01c68bd9 Author: grusev <[email protected]> Date: Fri May 16 13:55:12 2025 +0300 Patch for Real S3 library names (#2353) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Currently we create library names which are too long for real S3, this is a patch for the tests until the real bug is addressed Manually triggered run: https://github.com/man-group/ArcticDB/actions/runs/15013824867 #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> commit bb65a85ab82dd7fec5297b258956545f8b4adea7 Author: Alex Owens <[email protected]> Date: Fri May 16 11:41:18 2025 +0100 Add resolve_defaults back in as a static method of NativeVersionStore (#2358) #### Reference Issues/PRs Was removed in #2345 , but is needed at least by some internal tests, and technically constitutes an API break (although we don't expect anybody to be using it) commit e78758a7fe5fbb02085dcfae01218903d6dad6d9 Author: grusev <[email protected]> Date: Fri May 16 13:25:24 2025 +0300 Installation Tests Workflow Fixes (#2354) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? A failure when job is triggered on schedule is fixed - the string containe extra single quotes. Also the order of 2 steps is changed for schedulling specific use case. Changes in workflow dispatch are implemented to simplify execution and leave some parts for enhancements - ie the selection of exact os-python-repo combination which needs actually single flow of step and not matrix. S3 tests also enabled to run along with LMDB test by default #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> commit 9e544da9d823c3a4e76b256b741925af52a20742 Author: grusev <[email protected]> Date: Tue May 13 13:45:53 2025 +0300 Installation tests v4 (#2339) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Successful execution 5.2.6: https://github.com/man-group/ArcticDB/actions/runs/14641126753/job/41083591802 5.1.2: https://github.com/man-group/ArcticDB/actions/runs/14637571996 4.5.1: https://github.com/man-group/ArcticDB/actions/runs/14639124835/job/41077126258 1.6.2: https://github.com/man-group/ArcticDB/actions/runs/14701046721/job/41250511273 The PR contains workflow definition to execute tests on installed arcticdb it is combination of approaches: https://github.com/man-group/ArcticDB/pull/2330 https://github.com/man-group/ArcticDB/pull/2316 Installation tests are now in separate folder (python/installation_tests) not part of tests. They have their own fixtures, making them independent from rest of code base The tests are direct copy from originals with one modified to user ver 2 API. Otherwise now if there are changes in API each test in installation set can be addapted. As tests run very fast no need to use simulators, instead directly using S3 real storage The tests are executed by a workflow. Currently each test is executed against LMDB and real S3. The moto simulated version is not available in this moment due to tight coupling with protobufs which differ for ach version as well as tight coupling with whole existing test code. The workflow have 2 triggers: - manual trigger - allowing tests to be executed manually on demand - on schedule - the schedule execution is overnight. Each arcticdb version tests are executed within 1hr difference from the other. Thats is due to fact that executing all at once is likely to generate errors with real storages #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> commit 2612fb45f15350dc483ddde1c8d43c2d6a02731b Author: grusev <[email protected]> Date: Mon May 12 15:39:20 2025 +0300 Asv v2 s3 tests (Refactored) (#2249) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> Contains refactored framework for setting up shared storages + tests for AWS S3 storage Merged 3 Prs into one: - https://github.com/man-group/ArcticDB/pull/2185 - https://github.com/man-group/ArcticDB/pull/2227 - https://github.com/man-group/ArcticDB/pull/2204 Important: the benchmark tests down in this PR cannot run successfully. Therefore do not take them as criteria. All tests need to be run manually. Here are runs from 27-march: LMDB set: https://github.com/man-group/ArcticDB/actions/runs/14100376040/job/39495398374 Real set: https://github.com/man-group/ArcticDB/actions/runs/14100497273/job/39495728734 #### What does this implement or fix? #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> Co-authored-by: Georgi Rusev <Georgi Rusev> commit 3c2fe145cad45797356a4ec5fbd42e4dac57681a Author: William Dealtry <[email protected]> Date: Mon May 12 09:57:15 2025 +0100 size_t size in MacOS commit bb54de8879ab57c37093a62c5282e405fc9a834b Author: William Dealtry <[email protected]> Date: Mon May 12 09:03:04 2025 +0100 resolve defaults is a free function commit e973f8dbd898aedc747bc232e022c9a1137d882c Author: willdealtry <[email protected]> Date: Wed Apr 16 14:49:46 2025 +0100 Fix up file operations commit af1a171eab284902db4333946b732de7d9ec2b18 Author: Phoebus Mak <[email protected]> Date: Mon May 12 10:00:32 2025 +0100 Disable s3 checksumming (#2337) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> https://github.com/man-group/ArcticDB/issues/2251 #### What does this implement or fix? Disable s3 checksumming by setting environment variable in the wheel. #### Any other comments? This will also unblock the upgrade of `aws-sdk-cpp` on vcpkg. The upgrade will not be made in this PR One of the newly added test is needed to be skipped as `conda` CI has `aws-sdk-cpp` pinned at non-s3-checksumming version due the `libarrow` pin. `environment-dev.yml` doesn't align with the counterpart in the feedstock. Therefore the new version of `aws-sdk-cpp` is only used in the feedstock thus release wheel but not in local and CI build here. This will be addressed in separate ticket. [Commit](https://github.com/man-group/ArcticDB/pull/2337/commits/245a02cd455e39fb8f976301ccd5409e6ae88b13) to remove `libarrow` pin so more updated `aws-sdk-cpp`, which support s3 checksumming is in used in conda It's for verifying the change with the newly added the test. The [test](https://github.com/man-group/ArcticDB/actions/runs/14732394443/job/41349695905) is successful. #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit b808afac25bed84595b874f28b6b3ce2407fbd0c Author: grusev <[email protected]> Date: Fri May 9 15:46:17 2025 +0300 Delete STS roles regularly (#2344) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Due to limitation of STS roles number we should constantly do cleaning of failed to delete roles. The PR contains a scheduled job that would do that every Sa. The python script can also be executed at any time and will delete only roles created prior of today, leaving all currently running jobs unaffected As roles cannot be guaranteed to be cleaned after tests execution due to many factors, we should take them out on regular bases, and perhaps this is the quickest and most reliable approach #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> commit 0136f4ca52559e0640dc1b7518d6a8b0773ed3a8 Author: Ognyan Stoimenov <[email protected]> Date: Fri May 9 14:36:54 2025 +0300 Fix permissions for the automatic docs building (#2347) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Fixes failures when building the docs automatically on release like: https://github.com/man-group/ArcticDB/actions/runs/14832306883 #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 652d968561d473599e90508078005c4fd00a1ba4 Author: Phoebus Mak <[email protected]> Date: Sat May 3 02:03:44 2025 +0100 Query Stat framework v3 (#2304) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? New query stat implemenation which its schema is static The feature of linking arcticdb API calls to storage operations has been dropped. Now only storage operation stats will be logged. Therefore the schema of the stats is hardcoded and allow the summation of stats is logged, one statical object with numerous atomic ints is enough to do the job. No fancy map nor modification of folly executor. #### Any other comments? Sample output: ``` { // Stats "SYMBOL_LIST": // std::array<std::array<OpStats, NUMBER_OF_TASK_TYPES>, NUMBER_OF_KEYS> { "storage_ops": { "S3_ListObjectsV2": { // OpStats "result_count": 1, "total_time_ms": 34 } } } } ``` #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 9b93303adf8d5c436ae267be4d950fc5e55139de Author: Vasil Danielov Pashov <[email protected]> Date: Fri May 2 17:29:18 2025 +0300 Hold the GIL when incrementing None's refcount to prevent race conditions when there are multiple Python threads (#2334) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> None is a global static object in Python which is also refcounted. When ArcticDB creates `None` objects it must increase their refcount. It must acquire the GIL when the refcount is increased. Currently we don't acquire the GIL when we do this, we only hold a SpinLock protecting other ArcticDB threads from racing on the GIL refcount. With this change we add an atomic variable in the PythonHandler data which will accumulate the refcount. Then at the end of the operation when we reacquire the GIL we will increase the refcount. The same is done for the NaN refcount, note that we don't really need the GIL to increase NaN's refcount as we create it internally and don't handle it to Python until the read operation is done. Currently only read operations need to work with the `None` object. `apply_global_refcounts` must be called at the very end before passing the dataframe to python to prevent something raising an exception in after the refcount is applied but before python receives the data. Increasing None's refcount but never decreasing it doesn't seem to be fatal but we're trying to be good citizens. The best place for that is `adapt_read_df` or `adapt_read_dfs` as they are called at the end of all read functions. The code is changed so that the type handler data is created always in the python bindings file as it's easier to track. #### What does this implement or fix? #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Vasil Pashov <[email protected]> commit d4b40e287863960d608d52131471a88a435bf844 Author: Phoebus Mak <[email protected]> Date: Fri May 2 11:13:30 2025 +0100 Update docs for sts ca issue (#2265) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Clarify when does the workaround need for STS CA issue #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit a9d0e41e47c40a34e2e146a4297b5c638375fe85 Author: Phoebus Mak <[email protected]> Date: Tue Apr 29 17:44:08 2025 +0100 Skip azurite api check (#2288) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? The api check in Azurite has brought pain to local tests as the azurite version needs to keep up with the SDK version. We are only using very simple API so safe to skip the check. #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 550d3e7c29a5f9d67a0e993bbabc1cbf88295ef1 Author: grusev <[email protected]> Date: Thu Apr 24 17:45:21 2025 +0300 initial version fix for GCP (#2326) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev> commit 41a2086963e018ffe0ac90e6fea72d3577d463f3 Author: Alex Owens <[email protected]> Date: Wed Apr 23 12:31:26 2025 +0100 Timeseries defrag function (#2319) #### What does this implement or fix? Adds a (private) function to defragment timeseries data. See big list of caveats in code comments for limitations commit 61b00e99ce7861a0fd767572be0d58600c065b53 Author: Vasil Danielov Pashov <[email protected]> Date: Thu Apr 17 16:04:41 2025 +0300 Fix race conditions on the None object refcount during a multithreaded read (#2320) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? **Bugfix** Columns are handled in multiple threads during read calls. String columns can contain `None` values. `None` is a global static ref counted object and the refcount is not atomic. When ArcticDB places `None` objects in columns it must increment the refcount. Currently None objects are allocated only via type handlers. ArcticDB has a global spin-lock that is shared by all type-handlers. The bug is caused by [this line](https://github.com/man-group/ArcticDB/blob/300e121e1be47ecfbabba78f077851a9c3b0772c/cpp/arcticdb/python/python_utils.hpp#L117) the spin-lock is wrapped in a `std::lock_guard` but there is a call to `unlock`. When `unlock` is called another thread will take the lock and start calling `Py_INCREF(Py_None)` but when the function exists the `std::scope_guard` will call unlock again allowing another thread to start calling `Py_INCREF(Py_None)` in parallel. **Refactoring** - Remove GIL safe py none. It was created because pybind11 wraps `Py_None` in an object and calls `Py_INCREF(Py_None)` and we must hold the GIL when incrementing the refcount. The wrapper we have was used only to get the pointer to the `Py_None` object. We don't need pybind11 to do that. Using the C API we can directly get `Py_None` which is global object - Add function to check if a python object is `None` - Remove uses of py::none{} in places where we don't hold the GIL (most of those were just to get the `Py_None` object that's inside `py:none` #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Vasil Pashov <[email protected]> commit 396757028afbd460fd6325fd2403636ed8482d56 Author: Julien Jerphanion <[email protected]> Date: Thu Apr 17 11:39:55 2025 +0200 Support MSVC 19.29 (#2332) Signed-off-by: Julien Jerphanion <[email protected]> commit b89fc53dbd7cd1eee783fed1fba7b401d69b6ffd Author: Georgi Petrov <[email protected]> Date: Wed Apr 16 15:35:56 2025 +0300 Increase tolerance to arithmetic mismatches with Pandas with floats (#2333) #### Reference Issues/PRs https://github.com/man-group/ArcticDB/actions/runs/14487537861/job/40636907727?pr=2331 #### What does this implement or fix? To resolve this type of flakiness: ``` python FAILED tests/hypothesis/arcticdb/test_resample.py::test_resample - AssertionError: Series are different Series values are different (100.0 %) [index]: [1969-12-31T23:59:01.000000000] [left]: [-1706666.6666666667] [right]: [-1706325.3333333333] At positional index 0, first diff: -1706666.6666666667 != -1706325.3333333333 Falsifying example: test_resample( df= col_float col_int col_uint 1970-01-01 00:00:00.000000000 0.0 9223372036849590785 0 1970-01-01 00:00:00.000000001 0.0 512 0 1970-01-01 00:00:00.000000002 0.0 -9223372036854710785 0 , rule='1min', origin='start', offset='1s', ) You can reproduce this example by temporarily adding @reproduce_failure('6.72.4', b'AXicY2RgYGQAYxCCUEwMyAAkzVD/Hwg2PGIEq2ACqgASjBDR/0yMMFUwAAB9FAui') as a decorator on your test case ``` #### Any other comments? A similar fix was done here: https://github.com/man-group/ArcticDB/commit/fe9de294580526e921102fbdedda736f20596fc7 #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 30f4c48db0d742898f629d129b5d1caa83091662 Author: Alex Seaton <[email protected]> Date: Wed Apr 16 13:08:30 2025 +0100 Symbol sizes API (#2266) Add Python APIs to get sizes of symbols, in a new `AdminTools` class. Add documentation for this feature to our website. You can access the new tools with: ``` lib: Library lib.admin_tools(): AdminTools ``` Refactor the existing symbol scanning APIs to a visitor pattern so they can all share as much of the implementation as possible. Monday: 8560764974 commit 6b3c593924808d33a39e275f921f613f77139d06 Author: Georgi Petrov <[email protected]> Date: Wed Apr 16 14:32:57 2025 +0300 Prevent exceptions in ReliableStorageLockGuard destructor (#2331) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Sometimes when trying to release the lock, there could be exceptions that occur (either storage related or others). This PR is trying to catch all exceptions, mainly to prevent unnecessary seg faults in enterprise. #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit aa585fc0a5ae60f61f1752d78614e0951047d21e Author: Julien Jerphanion <[email protected]> Date: Wed Apr 16 10:10:11 2025 +0200 conda-build: Extend development environment for Windows (#2328) #### Reference Issues/PRs Extracted from https://github.com/man-group/ArcticDB/pull/2252. #### What does this implement or fix? #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> Signed-off-by: Julien Jerphanion <[email protected]> commit 42091dbe1ea4b7b827cad4f53b2ef099eb43b4fb Author: Ognyan Stoimenov <[email protected]> Date: Tue Apr 15 18:13:47 2025 +0300 Fix pr getting action (#2323) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? https://github.com/VanOns/get-merged-pull-requests-action was updated to fix some issues but changes its API * Accommodate new API * Remove previous workaround (now fixed) * Pin action to 1.3.0 so no such breaks happen in the future * Changelog generator was not skipping release candidates when comparing version. Fixed now * Fix docs building permission #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 311c1bf8099a491bf1dd85c09e83d640f9d6ce74 Author: Julien Jerphanion <[email protected]> Date: Tue Apr 15 17:13:05 2025 +0200 ci: Benchmark workflow adaptations (#2327) #### Reference Issues/PRs #### What does this implement or fix? Fixes the import error, working around https://github.com/airspeed-velocity/asv/issues/1465. #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> Signed-off-by: Julien Jerphanion <[email protected]> commit 7b37536b67b8410d2d890b8ee8bf38b05181aa61 Author: Vasil Danielov Pashov <[email protected]> Date: Tue Apr 15 11:25:03 2025 +0300 Refactor to_atom and to_ref to properly use forwarding references (#2321) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? This solves two problems - Code duplication. to_atom had 3 overloads for value/ref/rval ref for the same thing. Forwarding references were invented to solve this problem. - There were unnecessary copies. `to_atom` had an overload taking `VeriantKey` by value at some point some APIs have changed and started returning `AtomKey` instead of `VariantKey` due to the excessive use of `auto` nobody noticed the difference. Thus we ended up with calling `to_atom` on an atom key, that worked because `VariantKey` can be constructed from an `AtomKey` implicitly thus we ended up constructing `VariantKey` from an `AtomKey` only to extract the `AtomKey` from that. Forwarding references do not allow implicit conversions thus the compiler pointed out all places in the code where the above happens. #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> commit 300e121e1be47ecfbabba78f077851a9c3b0772c Author: grusev <[email protected]> Date: Fri Apr 11 14:07:36 2025 +0300 Update s3.py moto*.create_fixture - add retry attempts (#2311) #### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Addresses couple of flaky tests opened due to NFS or S3…
Reference Issues/PRs
What does this implement or fix?
Those are installation tests targeted to be executed against different versions of arcticdb from conda and pypi without building our project.
GitHubWorkflow (not yet finished)
allows matrix execution of selected combinations of OS-es and Pytho versions. For each it install artcicdb from conda(mac-os) and pypi(other os-es), along with our test dependencies. We build also protobufs and execute our tests
Not finished: ability to select arcticdb version, ability to run on demand with selected options from user
Link to successfull run with LMDB tests only:
latest 5.3.3: https://github.com/man-group/ArcticDB/actions/runs/14494364845
5.1.2: https://github.com/man-group/ArcticDB/actions/runs/14509109094/job/40703855715
4.4.7: https://github.com/man-group/ArcticDB/actions/runs/14509267618
Added:
Modified:
Opened Questions:
Any other comments?
Checklist
Checklist for code changes...