Skip to content

Reading append blobs while they're being modified without triggering a ResourceModifiedError exception #39817

Open
@kyrre

Description

@kyrre

We have Azure Container App Jobs that read JSON files containing security log events from a storage account and then insert the events into Delta tables in another storage account..

These are append blobs containing Advanced Hunting events that are being written to every X minutes. We do not have any control of how they are being written.

Between invocations we are keeping track of how much of the file we have processed (offset) and we stream into a PyArrow buffer:

buffer = pa.allocate_buffer(length)
output = pa.output_stream(buffer)

# some offset and length calculation

download = client.download_blob(
    offset=offset,
    length=length,
    progress_hook=progress_callback,
)

bytes_read = download.readinto(output)

If we want to download 2GB and use the max_single_get_size with its default value of 32MB then the the Python SDK will download multiple chunks of size 4MB. Unfortunately, if the blob is being written to at the same time we are downloading one of the chunks it will notice that the ETAG has changed and it will throw a ResourceModifiedError.

This thread explains in detail what is going on #30233 (comment).

While this makes sense for a regular blob, why is this behaviour necessary for an append blob? Is there any match condition that will allow us to ignore the ETAG change?

The only way I can think of is either:

  1. Increasing the max_single_get_size. However, even when running in an ACA large downloads are unstable.
  2. Doing the chunking ourselves.

Are we doing something wrong here?

Metadata

Metadata

Labels

ClientThis issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.StorageStorage Service (Queues, Blobs, Files)customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions