You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#### Reference Issues/PRs
<!--Example: Fixes#1234. See also #3456.-->
#### What does this implement or fix?
Reproduced problem:
https://github.com/man-group/ArcticDB/actions/runs/15417513896/job/43384819172
Shows not all processes can be spawned because lack of memory
To help analysis of flaky tests also a workflow extension is made so
that it is possible to run fully custom pytest commands like, this which
runs in all VMs one test until an error, timeout (6hrs) or repeated
successfully 100 times:
```
pytest -n auto -v --count=100 -x python/tests/integration/arcticdb/test_storage_lock.py
```
Try 1 - determine the the number of actual processes using is_alive()
method:
Fix:
ce50eb2
Log:
https://github.com/man-group/ArcticDB/actions/runs/15419424266/job/43391336706
Outcome: there are many errors on Windows and few on Linux. Perhaps this
is not optimal one?
Note that on Windows this approach is huge disaster:
```
DataFrame.iloc[:, 0] (column name="col") values are different (100.0 %)
[index]: [0]
[left]: [63]
[right]: [100]
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 2 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!
```
Try 2 - Attempt to fix with having unique symbol being created after
common protected with lock counter is increased and before the lock is
released. That fix relies on assumption that there should be as many
unique symbols created as the number of actual running and live threads,
Thus the common counter should always be equal in perfect case, but
never lower than the number of created symbols.
Fix:
c85adf6
Log:
https://github.com/man-group/ArcticDB/actions/runs/15434747123/job/43439581916
Analysis: we see many errors on windows like:
```
https://github.com/man-group/ArcticDB/actions/runs/15434747123/job/43439415186
The hosted runner lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.'
https://github.com/man-group/ArcticDB/actions/runs/15434747123/job/43439450923
Process completed with exit code -1073741571.
The exit code -1073741571 in pytest usually indicates a stack overflow or memory exhaustion issue, particularly on Windows systems. This corresponds to the Windows error code 0xC00000FD, which means the process ran out of stack space.
https://github.com/man-group/ArcticDB/actions/runs/15434747123/job/43439404387
Error: Process completed with exit code 127.
https://github.com/man-group/ArcticDB/actions/runs/15434747123/job/43439551901
Out of memory
```
This means 100 processes is to much for windows, we need to cut them so
that no host problem are experienced
Try 3 - on Windows max processes are 30 not 100
Log:
https://github.com/man-group/ArcticDB/actions/runs/15444657461/job/43471434052
Analysis: No errors based on lack of memory seen. However there are
failures on Windows still which cannot be explained:
see
https://github.com/man-group/ArcticDB/actions/runs/15444657461/job/43472314667
note that all failures are off by 1 - epected vs actual counter. This
means the logic is working, but casts shadow that there could be some
other issue related to windows - could be a bug also.
```
2025-06-04 14:49:28,647 - tests.integration.arcticdb.test_storage_lock - INFO - Process 3380: start read
2025-06-04 14:49:31,226 - tests.integration.arcticdb.test_storage_lock - INFO - Process 3380: previous value 2
20250604 14:49:31.368214 8236 E arcticdb | Unexpectedly lost the lock in heartbeating thread. Maybe lock timeout is too small.
E20250604 14:49:31.368520 8236 FunctionScheduler.cpp:507] Error running the scheduled function <Extend lock>: struct arcticdb::lock::LostReliableLock: Unknown exception
2025-06-04 14:49:31,445 - tests.integration.arcticdb.test_storage_lock - INFO - Process 7544: start read
2025-06-04 14:49:31,773 - tests.integration.arcticdb.test_storage_lock - INFO - Process 7544: previous value 2
2025-06-04 14:49:36,074 - tests.integration.arcticdb.test_storage_lock - INFO - Process 3380: incrementing and saving value 3
2025-06-04 14:49:36,105 - tests.integration.arcticdb.test_storage_lock - INFO - Process 7544: incrementing and saving value 3
```
-------------------------
Try 4 - After discussion with Ivo we determined this is not a bug but
will increase default timeout of storage lock to 20 seconds (even that
is too low in practical usage scenarios, but is ok for tests)
Log: https://github.com/man-group/ArcticDB/actions/runs/15464091500
Analysis - all tests successfull we have a winner!
#### Any other comments?
#### Checklist
<details>
<summary>
Checklist for code changes...
</summary>
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
- [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>
<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->
---------
Co-authored-by: Georgi Rusev <Georgi Rusev>
Copy file name to clipboardExpand all lines: .github/workflows/build.yml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ on:
35
35
type: string
36
36
default: arcticdb-dev-clang:latest
37
37
pytest_args:
38
-
description: Rewrite what tests will run
38
+
description: Rewrite what tests will run or do your own pytest line if string starts with pytest ... (Example -- pytest -n auto -v --count=50 -x python/tests/compat)
39
39
type: string
40
40
default: ""
41
41
run-name: Building ${{github.ref_name}} on ${{github.event_name}} by ${{github.actor}}
0 commit comments