cloud/fs concurrency for large files

  @shcheklein in that case our concurrency level will be `jobs * jobs`, which is generally going to be way too high in the default case. I also considered splitting `jobs` between the two (so `batch_size=(min(1, jobs // 2))` and the same for `max_concurrency`) but that will make us perform a lot worse in the cases where you are pushing a large # of files that are smaller than the chunk size

I think it will be worth revisiting this to properly determine what level of concurrency we should be using at both the file and chunk level, but that is dependent on the number of files being transferred, the size of all of those files, and the chunk size for the given cloud. This is all work that we can do at some point, but in the short term I prioritized getting a fix for the worst case scenario for azure (pushing a single large file).

Also, any work that we do on this right now would only work for Azure, since right now adlfs is the only underlying fsspec implementation that actually does concurrent chunked/multipart upload/downloads . It would be better for us to contribute upstream to make the s3/gcs/etc implementations support the chunk/multipart concurrency first, before we get into trying to make DVC optimize balancing file and chunk level concurrency

_Originally posted by @pmrowla in https://github.com/iterative/dvc-objects/issues/218#issuecomment-1672435786_

### Tasks
```[tasklist]
- [x] S3: https://github.com/fsspec/s3fs/pull/848
- [ ] GCS
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cloud/fs concurrency for large files #9893

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cloud/fs concurrency for large files #9893

Description

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions