Reading append blobs while they're being modified without triggering a ResourceModifiedError exception

We have Azure Container App Jobs that read JSON files containing security log events from a storage account and then insert the events into Delta tables in another storage account..

These are append blobs containing [Advanced Hunting](https://learn.microsoft.com/en-us/defender-endpoint/api/raw-data-export-storage) events that are being written to every X minutes. We do not have any control of how they are being written. 

Between invocations we are keeping track of how much of the file we have processed (offset) and we stream into a PyArrow buffer: 

```python
buffer = pa.allocate_buffer(length)
output = pa.output_stream(buffer)

# some offset and length calculation

download = client.download_blob(
    offset=offset,
    length=length,
    progress_hook=progress_callback,
)

bytes_read = download.readinto(output)
```

If we want to download 2GB and use the `max_single_get_size` with its default value of `32MB` then the the Python SDK will download multiple chunks of size 4MB. Unfortunately, if the blob is being written to at the same time we are downloading one of the chunks it will notice that the ETAG has changed and it will throw a `ResourceModifiedError`.

This thread explains in detail what is going on https://github.com/Azure/azure-sdk-for-python/issues/30233#issuecomment-1535567309.

While this makes sense for a regular blob, why is this behaviour necessary for an append blob? Is there any match condition that will allow us to ignore the ETAG change? 

The only way I can think of is either:

1. Increasing the `max_single_get_size`. However, even when running in an ACA large downloads are unstable. 
2. Doing the chunking ourselves. 


Are we doing something wrong here? 

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading append blobs while they're being modified without triggering a ResourceModifiedError exception #39817

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading append blobs while they're being modified without triggering a ResourceModifiedError exception #39817

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions