[DISCUSSION] [object_store] New crate with object store combinators / utilitles

# Please describe what you are trying to do.
TLDR: let's combine forces rather than all reimplementing caching / chunking / etc in `object_store`!

The [`ObjectStore`](https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html) trait is flexible and it is common to compose a stack of `ObjectStore` with one wrapping underlying stores 

For example, the [`ThrottledStore`](https://docs.rs/object_store/latest/object_store/throttle/struct.ThrottledStore.html) and [`LimitStore`](https://docs.rs/object_store/latest/object_store/limit/struct.LimitStore.html) provided with the object store crate does exactly this

```
┌──────────────────────────────┐
│        ThrottledStore        │
│(adds user configured delays) │
└──────────────────────────────┘
                ▲               
                │               
                │               
┌──────────────────────────────┐
│      Inner ObjectStore       │
│   (for example, AmazonS3)    │
└──────────────────────────────┘
```

## Many Different Behaviors
There are many types of behaviors that can be implemented this way. Some examples I am aware of:
1. The [`ThrottledStore`](https://docs.rs/object_store/latest/object_store/throttle/struct.ThrottledStore.html) and [`LimitStore`](https://docs.rs/object_store/latest/object_store/limit/struct.LimitStore.html) provided with the object store crate
5. Runs on a different tokio runtime (such as the [`DeltaIOStorageBackend`](https://github.com/delta-io/delta-rs/blob/e30ab7e366eb209718c87acb6974a815503181bc/crates/core/src/storage/mod.rs#L116-L120) in delta rs from @ion-elgreco. 
2.  Limit the total size of any individual request (e.g. the `LimitedRequestSizeObjectStore ` from https://github.com/apache/datafusion/issues/15067)
2. Break single large requests into multiple concurrent small requests ("chunking") - @crepererum is working on this I think in influx
4. Caches results of  requests locally using memory / disk (see [ObjectStoreMemCache](https://github.com/influxdata/influxdb3_core/tree/main/object_store_mem_cache) in influxdb3_core), and [this one](https://github.com/slatedb/slatedb/blob/main/src%2Fcached_object_store%2Fobject_store.rs) in slatedb @criccomini  (thanks @ion-elgreco for the pointer)
6. Collect statistics / traces and report metrics (see [ObjectStoreMetrics](https://github.com/influxdata/influxdb3_core/tree/main/object_store_metrics) in influxdb3_core)
7. Visualization of object store requests over time

## Desired behavior is varied and application specific

Also, depending on the needs of the particular app, the ideal behavior / policy is likely different. 

For example, 
1. In the case of  https://github.com/apache/datafusion/issues/15067, splitting one large request into several small requests made in series is likely the desired approach (maximize the chance they succeed)
2. If you are trying to maximize read bandwidth in a cloud server setting, splitting up ("Chunking") large requests into several parallel ones may be desired
3. If you are trying to minimize costs (for example doing bulk reorganizations / compactions on historical data that are not latency sensitive), using a single request for large objects (what is done today) might be desired
4. Maybe you want to adapt more dynamically to network and object store conditions [as described in Exploiting Cloud Object Storage for High-Performance Analytics](https://vldb.org/pvldb/vol16/p2769-durner.pdf)

So the point is that I don't think any one individual policy will work for all use cases (though we can certainly discuss changing the default policy)

Since `ObjectStore` is already composable, I already see projects implementing these types of things independently (for example, delta-rs and influxdb_iox both have a cross runtime object stores, and @mildbyte from splitgraph implemented some sort of visualization of object store requests over time)

I believe this is similar to the OpenDAL [concept of `layers`](https://docs.rs/opendal/latest/opendal/#compose-layers) but @Xuanwo please correct me if I am wrong

# Desired Solution

I would like it ti be easier for users of object_store to access such features without having implement custom wrappers in parallel independently


# Alternatives

## New `object_store_util` crate
One alternative is to make a new crate, named`object_store_util` or similar mirroring [`futures-util`](https://crates.io/crates/futures-util) and [`tokio-util`](https://crates.io/crates/tokio-util) that has a bunch of these ObjectStore combinators

This could be housed outside of the apache organization, but I think it would be most valuable for the community if it was inside 

## Add additional policies to provided implmenetations
An alternate is to implement a more sophisticated default implementations (for example, add more options to the [`AmazonS3`](https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3.html) implementation. 

One upside of this approach is it could take advantage of implementation specific features 

One downside is additional code and configuration complexity, especially as the  different strategies are all applicable to multiple stores (e.g. GCP, S3 and Azure). Another downside is specifying the policy might be complex (like specifying concurrency along with chunking and under what circumstances should each be used)


**Additional context**
- https://github.com/apache/datafusion/issues/15067
- https://github.com/apache/datafusion/pull/14286
- https://github.com/delta-io/delta-rs/issues/2595
- https://github.com/apache/arrow-rs-object-store/issues/274

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DISCUSSION] [object_store] New crate with object store combinators / utilitles #14

Please describe what you are trying to do.

Many Different Behaviors

Desired behavior is varied and application specific

Desired Solution

Alternatives

New `object_store_util` crate

Add additional policies to provided implmenetations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DISCUSSION] [object_store] New crate with object store combinators / utilitles #14

Description

Please describe what you are trying to do.

Many Different Behaviors

Desired behavior is varied and application specific

Desired Solution

Alternatives

New object_store_util crate

Add additional policies to provided implmenetations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New `object_store_util` crate