Skip to content

WIP: Implement Efficient Array Slicing and Minimal Requests from the Client#1302

Draft
genematx wants to merge 13 commits intobluesky:mainfrom
genematx:array-perf
Draft

WIP: Implement Efficient Array Slicing and Minimal Requests from the Client#1302
genematx wants to merge 13 commits intobluesky:mainfrom
genematx:array-perf

Conversation

@genematx
Copy link
Contributor

@genematx genematx commented Mar 4, 2026

This PR adds to possibility to request larger blocks spanning multiple contiguous chunks in a single request. For this, the ?block= query parameter has been expanded to accept slices in addition to integer indexes.

Additionally, it improves the Python client by allowing it to request only the necessary data if an array is sliced, instead of downloading an entire array and slicing it afterwards.

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section

@genematx genematx requested a review from danielballan March 4, 2026 16:58
# This will be used to determine how to combine multiple requests when fetching
# data in blocks. If set to None, the client will not attempt to combine
# requests and will fetch each chunk separately as determiied by the structure.
RESPONSE_BYTESIZE_LIMIT = 250 * 1024 * 1024 # 250 MiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without evidence, I would guess that the benefits of parallelism show up before 250 MiB. Some basic performance testing (over LAN and WAN) would be worthwhile, I think. We don't have to optimize it perfectly in this first pass, but I would like some evidence to base this number on initially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants