Implement `compute(streaming=True)`

**Feature request**

The idea is to propose a different way of computing the result, giving users a trade-off between performance (`streaming=False`) and low memory with less chance of worker timeout (`streaming=True`). The `.compute(streaming=True)` would do something like:
```python
def compute(self, streaming: bool = False, **kwargs):
    if not streaming:
        return npd.NestedFrame(super().compute(**kwargs))

    client = dask.distributed.get_client()
    partitions_per_chunk = 1 if client is None else len(client.scheduler_info()["workers"])

    batches = []
    for batch in CatalogStream(self, client=client, partitions_per_chunk=partitions_per_chunk, shuffle=False):
        if len(batch) == 0:
            continue
        batches.append(batch)
    result = pd.concat(batches)
    return npd.NestedFrame(result)
```

**Before submitting**
Please check the following:

- [x] I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
- [ ] I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
- [x] If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `compute(streaming=True)` #1312

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement compute(streaming=True) #1312

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement `compute(streaming=True)` #1312