Skip to content

Fix multipart upload #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

wietzesuijker
Copy link
Contributor

This PR optimizes Azure multipart uploads by ensuring that only non-empty, strictly ordered block IDs are committed.

  • Fixed block IDs: Changed block IDs in AzMultiPartUpload.write_part to use a zero‐padded format (block-{part:05d}) so that all IDs have uniform length.
  • Sorted block list: In AzMultiPartUpload.finalise, the parts list is now sorted by PartNumber before creating the BlobBlock list, ensuring correct ordering.
  • Improved merge logic: In MPUChunk.merge, the new nextPartId is set to one greater than the maximum PartNumber in the merged parts, ensuring monotonically increasing part numbers.
  • Final flush handling: In MPUChunk.flush, when flushing leftover data in left_data, if finalising and the leftover is smaller than write.min_write_sz, the block is still written without asserting its size.
  • Data Conversion: In MPUChunk.maybe_write, the spill data is explicitly converted to bytes before being passed to the writer.

However, this error still persists: HttpResponseError: The specified block list is invalid / ErrorCode:InvalidBlockList.

Next steps
I suspect a 0‐byte block (or an out‐of‐order/duplicate block) is still being staged. I will try things along these lines next:

  • Adjust maybe_write to skip writing if the spill data is empty.
  • Verify that all part numbers are strictly increasing across merges.
  • Ensure that no empty block (i.e. 0 bytes) ever gets sent to Azure.

@wietzesuijker
Copy link
Contributor Author

@Kirill888 fyi, WiP after #212

@Kirill888
Copy link
Member

@wietzesuijker I don't think this is correct way to approach this, looks like we are trying to debug by code change, no good.

Need proper test suite for chunk merging and it's interaction with chunk flushing.

@wietzesuijker
Copy link
Contributor Author

@Kirill888 Thanks! I added quite some debugging to find out how to make this work. With this code, I wrote a set of sizeable COGs to ABS without errors as quick as you'd expect it to be. The style of the code is currently still a bit out of sync with the rest of the library. I'm curious to get your thoughts on the implementation, though.

  • Block IDs: Uses unique, Base64-encoded block IDs containing a zero-padded part number (e.g., prefix-p000001).
  • Part Numbering: Assigns part numbers sequentially. During merges, the next part number is based on the highest number found in the combined parts list, using a Dask lock for safety.
  • Final Commit: Builds the final block list by querying Azure for uncommitted blocks matching the upload's prefix, decoding part numbers from the block IDs, sorting numerically, and then committing that sorted list.
  • Data Handling: Ensures data is converted to bytes before being written.

@wietzesuijker wietzesuijker requested a review from Kirill888 April 6, 2025 22:50
@@ -1,3 +1,3 @@
"""version information only."""

__version__ = "0.4.10"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert, version is handled differently now

@Kirill888
Copy link
Member

@wietzesuijker it's too much for me to review in a hurry, but we certainly don't need global locking during merge or partid sorting, they MUST be sorted by construction, see my comments from few days back.

Comment on lines +16 to 34
logger = logging.getLogger("odc.geo.cog._az")
# Set default level (can be overridden by application config)
logger.setLevel(logging.INFO)
# Avoid adding handlers multiple times if this module is reloaded
if not logger.hasHandlers():
# Configure console handler
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter(
"%(asctime)s [%(levelname)s] %(name)s (%(funcName)s): %(message)s"
)
handler.setFormatter(formatter)
logger.addHandler(handler)
# Prevent logs from propagating to the root logger if handlers are added here
logger.propagate = False

# Example: Increase log level for debugging
# logger.setLevel(logging.DEBUG)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no logging setup like that on import please, do it inside your test app if needed for debugging

Comment on lines +403 to +406
logging.getLogger(__name__).debug(
"Leftover data size (%s bytes) is less than min_write_sz; deferring flush.",
len(self.left_data),
)
Copy link
Member

@Kirill888 Kirill888 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO: it's non-recoverable error and a sign of either misconfiguration or mistake in code

Comment on lines +408 to +410
for part in self.parts:
if "PartNumber" not in part or "BlockId" not in part:
raise ValueError(f"Malformed part metadata: {part}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO, it's upto impl to decide what is malformed or not.

@@ -86,6 +92,7 @@ def __init__(
self.observed: list[tuple[int, Any]] = [] if observed is None else observed
self.is_final = is_final
self.lhs_keep = lhs_keep
self._global_counter = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO, can't have that

Comment on lines +155 to 174
# Use a Dask distributed lock to ensure the nextPartId update is atomic
lock = Lock("mpu_merge_lock")
with lock:
all_parts = lhs.parts + rhs.parts
all_parts.sort(key=lambda p: int(p["PartNumber"]))
new_nextPartId = (
all_parts[-1]["PartNumber"] + 1
if all_parts
else max(lhs.nextPartId, rhs.nextPartId)
)
return MPUChunk(
rhs.nextPartId,
new_nextPartId,
rhs.write_credits,
rhs.data,
lhs.left_data,
lhs.parts + rhs.parts,
all_parts,
lhs.observed + rhs.observed,
rhs.is_final,
lhs.lhs_keep,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO, I commented on this before, parts must be ordered by construction, and this logic for next part selection is bogus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and no lock is needed, nor will this work in a distributed setting

@Kirill888
Copy link
Member

this is too much change and it contradicts original design, I think there is misunderstanding on how this is supposed to work.

@Kirill888 Kirill888 closed this Apr 7, 2025
@wietzesuijker
Copy link
Contributor Author

Thanks, I've opened a clean alternative in #223

@wietzesuijker wietzesuijker deleted the fix/blocklist branch April 9, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants