Skip to content

Conversation

@sangshuduo
Copy link
Owner

  • Implemented a new tool to download files from an S3 bucket using multiple threads.
  • Added command-line arguments for bucket name, output directory, number of workers, and retries.
  • Included a feature to read file lists from a specified file or list objects directly from the bucket.
  • Utilized libraries such as aws-sdk-s3 and tokio for asynchronous operations.

- Implemented a new tool to download files from an S3 bucket using multiple threads.
- Added command-line arguments for bucket name, output directory, number of workers, and retries.
- Included a feature to read file lists from a specified file or list objects directly from the bucket.
- Utilized libraries such as `aws-sdk-s3` and `tokio` for asynchronous operations.
@mentatbot
Copy link
Contributor

mentatbot bot commented Apr 1, 2025

I'm getting to work. See my steps and track my cost here.

  • Wake on any new activity.
    I always see new activity while actively working. When paused, I'll only respond when tagged unless this box is checked.

@mentatbot
Copy link
Contributor

mentatbot bot commented Apr 1, 2025

I've taken a look at your S3 bucket downloader implementation. This looks like a great addition to the toolkit that complements the existing random_pairs_of_s3file tool nicely!

A few thoughts:

Strengths:

  • Great use of multi-threading with rayon to parallelize downloads
  • The progress bars are well-implemented and provide good user feedback
  • Smart caching of file lists to avoid unnecessary bucket listing operations
  • Solid error handling with retries and exponential backoff

Potential improvements:

  1. The code uses div_ceil which might require a newer Rust version - we should verify this works across all target Rust versions in CI.
  2. Loading entire files into memory with Vec<u8> could be problematic for very large files - maybe consider a streaming approach for large files.
  3. It might be helpful to add some documentation in the README.md about this tool, similar to what was done for random_pairs_of_s3file.

Let's see what the CI says, but this looks like a solid implementation to me!

@sangshuduo sangshuduo merged commit 4601a3d into main Apr 1, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants