Skip to content

Commit 873ae6c

Browse files
authored
Extend update docs to mention possible data fragmentation (#2164)
#### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing -->
1 parent 1afd029 commit 873ae6c

File tree

1 file changed

+28
-4
lines changed

1 file changed

+28
-4
lines changed

python/arcticdb/version_store/library.py

+28-4
Original file line numberDiff line numberDiff line change
@@ -966,9 +966,9 @@ def append(
966966
metadata
967967
Optional metadata to persist along with the new symbol version. Note that the metadata is
968968
not combined in any way with the metadata stored in the previous version.
969-
prune_previous_versions, default=False
969+
prune_previous_versions
970970
Removes previous (non-snapshotted) versions from the database.
971-
validate_index: bool, default=True
971+
validate_index
972972
If True, verify that the index of `data` supports date range searches and update operations.
973973
This tests that the data is sorted in ascending order, using Pandas DataFrame.index.is_monotonic_increasing.
974974
@@ -1099,6 +1099,9 @@ def update(
10991099
If dynamic schema is used then data will override everything in storage for the entire index of ``data``. Update
11001100
will not keep columns from storage which are not in ``data``.
11011101
1102+
The update will split the first and last segments in the storage that intersect with 'data'. Therefore, frequent
1103+
calls to update might lead to data fragmentation (see the example below).
1104+
11021105
Parameters
11031106
----------
11041107
symbol
@@ -1151,6 +1154,25 @@ def update(
11511154
2018-01-01 400
11521155
2018-01-03 40
11531156
2018-01-04 4
1157+
1158+
Update will split the first and the last segment intersecting with ``data``
1159+
>>> index = pd.date_range(pd.Timestamp("2024-01-01"), pd.Timestamp("2024-02-01"))
1160+
>>> df = pd.DataFrame({f"col_{i}": range(len(index)) for i in range(1)}, index=index)
1161+
>>> lib.write("test", df)
1162+
>>> lt=lib._dev_tools.library_tool()
1163+
>>> print(lt.read_index("test"))
1164+
start_index end_index version_id stream_id creation_ts content_hash index_type key_type start_col end_col start_row end_row
1165+
2024-01-01 2024-02-01 00:00:00.000000001 0 b'test' 1738599073224386674 9652922778723941392 84 2 1 2 0 32
1166+
>>> update_index=pd.date_range(pd.Timestamp("2024-01-10"), freq="ns", periods=200000)
1167+
>>> update = pd.DataFrame({f"col_{i}": [1] for i in range(1)}, index=update_index)
1168+
>>> lib.update("test", update)
1169+
>>> print(lt.read_index("test"))
1170+
start_index end_index version_id stream_id creation_ts content_hash index_type key_type start_col end_col start_row end_row
1171+
2024-01-01 00:00:00.000000 2024-01-09 00:00:00.000000001 1 b'test' 1738599073268200906 13838161946080117383 84 2 1 2 0 9
1172+
2024-01-10 00:00:00.000000 2024-01-10 00:00:00.000100000 1 b'test' 1738599073256354553 15576483210589662891 84 2 1 2 9 100009
1173+
2024-01-10 00:00:00.000100 2024-01-10 00:00:00.000200000 1 b'test' 1738599073256588040 12429442054752910013 84 2 1 2 100009 200009
1174+
2024-01-11 00:00:00.000000 2024-02-01 00:00:00.000000001 1 b'test' 1738599073268493107 5975110026983744452 84 2 1 2 200009 200031
1175+
11541176
"""
11551177
return self._nvs.update(
11561178
symbol=symbol,
@@ -1168,13 +1190,15 @@ def update_batch(
11681190
prune_previous_versions: bool = False,
11691191
) -> List[Union[VersionedItem, DataError]]:
11701192
"""
1171-
Perform an update operation on a list of symbols in parallel.
1193+
Perform an update operation on a list of symbols in parallel. All constrains on
1194+
[update](/api/library/#arcticdb.version_store.library.Library.update) apply to this call as well.
11721195
11731196
Parameters
11741197
----------
11751198
update_payloads: List[UpdatePayload]
11761199
List `arcticdb.library.UpdatePayload`. Each element of the list describes an update operation for a
1177-
particular symbol. Providing the symbol name, data, etc.
1200+
particular symbol. Providing the symbol name, data, etc. The same symbol should not appear twice in this
1201+
list.
11781202
prune_previous_versions: bool, default=False
11791203
Removes previous (non-snapshotted) versions from the library.
11801204
upsert: bool, default=False

0 commit comments

Comments
 (0)