A simple Python utility for bulk copying files from source S3 buckets to destination buckets based on YAML configuration.
- Bulk Operations: Copy all files from multiple source buckets to multiple destination buckets
- YAML Configuration: Easy-to-manage bucket mapping via
buckets.yaml - Progress Tracking: Real-time progress bars with
tqdm - Comprehensive Logging: Detailed logs to both file and console
- Validation: Pre-flight checks for bucket access and permissions
- Error Handling: Graceful handling of AWS credential and access issues
- Timestamp-based sync (incremental sync)
- "Sync from destination if newer" functionality
- File comparison based on modification dates
- Bidirectional synchronization
All files are copied and overwritten regardless of their timestamps.
- Python 3.7+
- AWS credentials configured via:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM roles (if running on EC2)
- Environment variables (
- Required permissions:
s3:ListBucket,s3:GetObject,s3:PutObjecton source and destination buckets
- Install dependencies:
pip install -r requirements.txt- Configure your bucket mappings in
buckets.yaml
Edit buckets.yaml to define your source and destination bucket mappings:
buckets:
- source_bucket: my-source-bucket-1
destination_buckets:
- my-dest-bucket-1
- my-dest-bucket-2
- source_bucket: my-source-bucket-2
destination_buckets:
- my-dest-bucket-3Run the sync operation:
python s3_sync.pyThe script will:
- Validate AWS credentials and bucket access
- List all objects in each source bucket
- Copy all objects to the specified destination buckets
- Display progress and generate detailed logs
- Console output: Real-time progress and status
- Log file:
s3_sync.logwith detailed operation history
0: Success - all files copied successfully1: Failure - one or more files failed to copy or other errors occurred
S3 Bucket Sync Utility
==============================
2024-10-30 10:15:23 - INFO - Successfully initialized S3 client
2024-10-30 10:15:23 - INFO - Loaded configuration for 5 source buckets
2024-10-30 10:15:24 - INFO - Found 150 objects in bucket 'source-bucket-1'
Copying from source-bucket-1: 100%|████████| 300/300 [01:23<00:00, 3.62it/s]
==============================
SYNC SUMMARY
==============================
Total files copied successfully: 300
Total files failed: 0
Total time: 83.45 seconds
==============================
boto3- AWS SDK for PythonPyYAML- YAML configuration parsingtqdm- Progress bar functionality