Fix multipart upload #217

wietzesuijker · 2025-03-28T00:38:33Z

This PR optimizes Azure multipart uploads by ensuring that only non-empty, strictly ordered block IDs are committed.

Fixed block IDs: Changed block IDs in AzMultiPartUpload.write_part to use a zero‐padded format (block-{part:05d}) so that all IDs have uniform length.
Sorted block list: In AzMultiPartUpload.finalise, the parts list is now sorted by PartNumber before creating the BlobBlock list, ensuring correct ordering.
Improved merge logic: In MPUChunk.merge, the new nextPartId is set to one greater than the maximum PartNumber in the merged parts, ensuring monotonically increasing part numbers.
Final flush handling: In MPUChunk.flush, when flushing leftover data in left_data, if finalising and the leftover is smaller than write.min_write_sz, the block is still written without asserting its size.
Data Conversion: In MPUChunk.maybe_write, the spill data is explicitly converted to bytes before being passed to the writer.

However, this error still persists: HttpResponseError: The specified block list is invalid / ErrorCode:InvalidBlockList.

Next steps
I suspect a 0‐byte block (or an out‐of‐order/duplicate block) is still being staged. I will try things along these lines next:

Adjust maybe_write to skip writing if the spill data is empty.
Verify that all part numbers are strictly increasing across merges.
Ensure that no empty block (i.e. 0 bytes) ever gets sent to Azure.

wietzesuijker · 2025-03-28T00:39:04Z

@Kirill888 fyi, WiP after #212

Kirill888 · 2025-04-03T02:29:32Z

@wietzesuijker I don't think this is correct way to approach this, looks like we are trying to debug by code change, no good.

Need proper test suite for chunk merging and it's interaction with chunk flushing.

odc/geo/cog/_mpu.py

wietzesuijker · 2025-04-06T22:50:37Z

@Kirill888 Thanks! I added quite some debugging to find out how to make this work. With this code, I wrote a set of sizeable COGs to ABS without errors as quick as you'd expect it to be. The style of the code is currently still a bit out of sync with the rest of the library. I'm curious to get your thoughts on the implementation, though.

Block IDs: Uses unique, Base64-encoded block IDs containing a zero-padded part number (e.g., prefix-p000001).
Part Numbering: Assigns part numbers sequentially. During merges, the next part number is based on the highest number found in the combined parts list, using a Dask lock for safety.
Final Commit: Builds the final block list by querying Azure for uncommitted blocks matching the upload's prefix, decoding part numbers from the block IDs, sorting numerically, and then committing that sorted list.
Data Handling: Ensures data is converted to bytes before being written.

Kirill888 · 2025-04-07T04:27:54Z

odc/geo/_version.py

@@ -1,3 +1,3 @@
 """version information only."""

-__version__ = "0.4.10"


please revert, version is handled differently now

…lush_rhs

… upload

Kirill888 · 2025-04-07T10:23:24Z

@wietzesuijker it's too much for me to review in a hurry, but we certainly don't need global locking during merge or partid sorting, they MUST be sorted by construction, see my comments from few days back.

Kirill888 · 2025-04-07T10:31:42Z

odc/geo/cog/_az.py

+logger = logging.getLogger("odc.geo.cog._az")
+# Set default level (can be overridden by application config)
+logger.setLevel(logging.INFO)
+# Avoid adding handlers multiple times if this module is reloaded
+if not logger.hasHandlers():
+    # Configure console handler
+    handler = logging.StreamHandler(sys.stdout)
+    formatter = logging.Formatter(
+        "%(asctime)s [%(levelname)s] %(name)s (%(funcName)s): %(message)s"
+    )
+    handler.setFormatter(formatter)
+    logger.addHandler(handler)
+    # Prevent logs from propagating to the root logger if handlers are added here
+    logger.propagate = False
+
+# Example: Increase log level for debugging
+# logger.setLevel(logging.DEBUG)
+



no logging setup like that on import please, do it inside your test app if needed for debugging

Kirill888 · 2025-04-07T10:39:33Z

odc/geo/cog/_mpu.py

+                logging.getLogger(__name__).debug(
+                    "Leftover data size (%s bytes) is less than min_write_sz; deferring flush.",
+                    len(self.left_data),
+                )


NO: it's non-recoverable error and a sign of either misconfiguration or mistake in code

Kirill888 · 2025-04-07T10:41:07Z

odc/geo/cog/_mpu.py

+            for part in self.parts:
+                if "PartNumber" not in part or "BlockId" not in part:
+                    raise ValueError(f"Malformed part metadata: {part}")


NO, it's upto impl to decide what is malformed or not.

Kirill888 · 2025-04-07T10:43:30Z

odc/geo/cog/_mpu.py

@@ -86,6 +92,7 @@ def __init__(
        self.observed: list[tuple[int, Any]] = [] if observed is None else observed
        self.is_final = is_final
        self.lhs_keep = lhs_keep
+        self._global_counter = 0


NO, can't have that

Kirill888 · 2025-04-07T10:45:04Z

odc/geo/cog/_mpu.py

+        # Use a Dask distributed lock to ensure the nextPartId update is atomic
+        lock = Lock("mpu_merge_lock")
+        with lock:
+            all_parts = lhs.parts + rhs.parts
+            all_parts.sort(key=lambda p: int(p["PartNumber"]))
+            new_nextPartId = (
+                all_parts[-1]["PartNumber"] + 1
+                if all_parts
+                else max(lhs.nextPartId, rhs.nextPartId)
+            )
        return MPUChunk(
-            rhs.nextPartId,
+            new_nextPartId,
            rhs.write_credits,
            rhs.data,
            lhs.left_data,
-            lhs.parts + rhs.parts,
+            all_parts,
            lhs.observed + rhs.observed,
            rhs.is_final,
            lhs.lhs_keep,
        )


NO, I commented on this before, parts must be ordered by construction, and this logic for next part selection is bogus.

and no lock is needed, nor will this work in a distributed setting

Kirill888 · 2025-04-07T10:47:19Z

this is too much change and it contradicts original design, I think there is misunderstanding on how this is supposed to work.

wietzesuijker · 2025-04-09T22:43:34Z

Thanks, I've opened a clean alternative in #223

Kirill888 reviewed Apr 3, 2025

View reviewed changes

odc/geo/cog/_mpu.py Outdated Show resolved Hide resolved

wietzesuijker force-pushed the fix/blocklist branch from dc28e63 to e42f822 Compare April 6, 2025 22:38

wietzesuijker requested a review from Kirill888 April 6, 2025 22:50

Kirill888 reviewed Apr 7, 2025

View reviewed changes

fix: Bypass flush assertion during merge by adding in_merge flag to f…

7f00915

…lush_rhs

wietzesuijker force-pushed the fix/blocklist branch from e42f822 to f77224e Compare April 7, 2025 07:05

fix: ensure non-empty, strictly ordered block IDs for Azure multipart…

31b0a56

… upload

wietzesuijker force-pushed the fix/blocklist branch from f77224e to 31b0a56 Compare April 7, 2025 08:52

Kirill888 reviewed Apr 7, 2025

View reviewed changes

Kirill888 closed this Apr 7, 2025

wietzesuijker deleted the fix/blocklist branch April 9, 2025 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multipart upload #217

Fix multipart upload #217

Uh oh!

wietzesuijker commented Mar 28, 2025

Uh oh!

wietzesuijker commented Mar 28, 2025

Uh oh!

Kirill888 commented Apr 3, 2025

Uh oh!

Uh oh!

wietzesuijker commented Apr 6, 2025

Uh oh!

Kirill888 Apr 7, 2025

Uh oh!

Kirill888 commented Apr 7, 2025

Uh oh!

Kirill888 Apr 7, 2025

Uh oh!

Kirill888 Apr 7, 2025 •

edited

Loading

Uh oh!

Kirill888 Apr 7, 2025

Uh oh!

Kirill888 Apr 7, 2025

Uh oh!

Kirill888 Apr 7, 2025

Uh oh!

Kirill888 Apr 7, 2025

Uh oh!

Kirill888 commented Apr 7, 2025

Uh oh!

wietzesuijker commented Apr 9, 2025

Uh oh!

Uh oh!

		@@ -1,3 +1,3 @@
		"""version information only."""

		__version__ = "0.4.10"

Fix multipart upload #217

Fix multipart upload #217

Uh oh!

Conversation

wietzesuijker commented Mar 28, 2025

Uh oh!

wietzesuijker commented Mar 28, 2025

Uh oh!

Kirill888 commented Apr 3, 2025

Uh oh!

Uh oh!

wietzesuijker commented Apr 6, 2025

Uh oh!

Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kirill888 commented Apr 7, 2025

Uh oh!

Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kirill888 Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kirill888 commented Apr 7, 2025

Uh oh!

wietzesuijker commented Apr 9, 2025

Uh oh!

Uh oh!

Kirill888 Apr 7, 2025 •

edited

Loading