You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/icechunk-python/performance.md
+34-3Lines changed: 34 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,36 @@
9
9
10
10
Icechunk is designed to be cloud native, making it able to take advantage of the horizontal scaling of cloud providers. To learn more, check out [this blog post](https://earthmover.io/blog/exploring-icechunk-scalability) which explores just how well Icechunk can perform when matched with AWS S3.
11
11
12
+
## Cold buckets and repos
13
+
14
+
Modern object stores usually reshard their buckets on-the-fly, based on perceived load. The
15
+
strategies they use are not published and very hard to discover. The details are not super important
16
+
anyway, the important take away is that on new buckets and even on new repositories, the scalability
17
+
of the object store may not be great from the start. You are expected to slowly ramp up load, as you
18
+
write data to the repository.
19
+
20
+
Once you have applied consistently high write/read load to a repository for a few minutes, the object
21
+
store will usually reshard your bucket allowing for more load. While this resharding happens, different
22
+
object stores can respond in different ways. For example, S3 returns 5xx errors with a "SlowDown"
23
+
indication. GCS returns 429 responses.
24
+
25
+
Icechunk helps this process by retrying failed requests with an exponential backoff. In our
26
+
experience, the default configuration is enough to ingest into a fresh bucket using around 100 machines.
27
+
But if this is not the case for you, you can tune the retry configuration using [StorageRetriesSettings](https://icechunk.io/en/latest/icechunk-python/reference/#icechunk.StorageRetriesSettings).
28
+
29
+
To learn more about how Icechunk manages object store prefixes, read our
@@ -75,8 +106,8 @@ Options for specifying how to split along a specific axis or dimension are:
75
106
2.[`ManifestSplitDimCondition.DimensionName`](./reference.md#icechunk.ManifestSplitDimCondition.DimensionName) takes a regular expression used to match the dimension names of the array;
76
107
3.[`ManifestSplitDimCondition.Any`](./reference.md#icechunk.ManifestSplitDimCondition.Any) matches any _remaining_ dimension name or axis.
77
108
78
-
79
109
For example, for an array with dimensions `time, latitude, longitude`, the following config
@@ -86,8 +117,8 @@ from icechunk import ManifestSplitDimCondition
86
117
ManifestSplitDimCondition.Any(): 1,
87
118
}
88
119
```
89
-
will result in splitting manifests so that each manifest contains (3 longitude chunks x 2 latitude chunks x 1 time chunk) = 6 chunks per manifest file.
90
120
121
+
will result in splitting manifests so that each manifest contains (3 longitude chunks x 2 latitude chunks x 1 time chunk) = 6 chunks per manifest file.
0 commit comments