Skip to content

Commit 6ee6555

Browse files
authored
Add Rclone client to 3p runner (#109)
1 parent cfd4737 commit 6ee6555

8 files changed

Lines changed: 555 additions & 27 deletions

File tree

cdk/s3_benchmarks/s3_benchmarks_stack.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ class S3ClientProps:
5050
'cli-crt': S3ClientProps(color=cloudwatch.Color.PURPLE),
5151
'boto3-crt': S3ClientProps(color=cloudwatch.Color.PINK),
5252
's5cmd': S3ClientProps(color='#00CED1'), # cyan
53+
'rclone': S3ClientProps(color='#20B2AA'), # light sea green
5354
}
5455

5556
# The "default" set of workloads to benchmark.

runners/s3-benchrunner-3p/README.md

Lines changed: 143 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@
33
Third-party S3 client benchmark runner. This runner supports various third-party S3 clients for benchmarking.
44

55
```
6-
usage: main.py [-h] [--verbose] EXECUTABLE_PATH {s5cmd} WORKLOAD BUCKET REGION TARGET_THROUGHPUT
6+
usage: main.py [-h] [--verbose] EXECUTABLE_PATH {s5cmd,rclone} WORKLOAD BUCKET REGION TARGET_THROUGHPUT
77
88
Third-party S3 client benchmark runner. Supports various third-party S3 clients.
99
1010
positional arguments:
1111
EXECUTABLE_PATH Path to the S3 client executable
12-
{s5cmd} S3 client to use
12+
{s5cmd,rclone} S3 client to use
1313
WORKLOAD
1414
BUCKET
1515
REGION
@@ -59,9 +59,46 @@ Here are examples showing how workloads are executed:
5959

6060
* cmd: `<5GiB_random_data> | s5cmd cp - s3://my-bucket/upload/5GiB/1`
6161

62+
### rclone
63+
64+
[rclone](https://rclone.org/) is a powerful command-line program to manage files on cloud storage. rclone supports:
65+
* Multiple cloud storage providers (including AWS S3)
66+
* Parallel transfers
67+
* Streaming support
68+
* Advanced features like bandwidth limiting, checksums, and encryption
69+
70+
See [installation instructions](#installation) before running.
71+
72+
### How this works with rclone
73+
74+
rclone is a versatile cloud storage tool that supports S3 operations through:
75+
- Configurable parallelism with `--transfers` flag
76+
- Native S3 API support
77+
- Efficient streaming for large files
78+
- Support for both single files and directory operations
79+
80+
This runner skips workloads that cannot be efficiently executed with rclone's command structure, similar to how the CLI runner works.
81+
82+
Here are examples showing how workloads are executed:
83+
84+
1) Single file upload/download:
85+
* workload: `upload-5GiB-1x`
86+
87+
* cmd: `rclone copy upload/5GiB/1 :s3:my-bucket/upload/5GiB/1`
88+
89+
2) Multiple files in same directory:
90+
* workload: `upload-5GiB-20x`
91+
92+
* cmd: `rclone copy upload/5GiB :s3:my-bucket/upload/5GiB`
93+
94+
3) Streaming from/to memory (single file only):
95+
* workload: `upload-5GiB-1x-ram`
96+
97+
* cmd: `<5GiB_random_data> | rclone copy - :s3:my-bucket/upload/5GiB/1`
98+
6299
# Installation
63100

64-
## Quick install
101+
## s5cmd Installation
65102

66103
### Install via Go
67104

@@ -77,10 +114,112 @@ go install github.com/peak/s5cmd/v2@v2.3.0
77114
~/go/bin/s5cmd version
78115
```
79116

80-
## Configuration
117+
### Configuration
81118

82119
s5cmd uses standard AWS credentials and configuration. Make sure you have:
83120
- AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
84121
- Appropriate S3 permissions for the bucket you're testing against
85122

86123
**Note:** This benchmark configures concurrency dynamically based on target throughput using the formula: `concurrency = target_throughput_Gbps / 0.4` as CRT does. For example, for 100 Gbps target throughput, the concurrency is set to 250. This ensures Apple to Apple comparison.
124+
125+
## rclone Installation
126+
127+
### Install from Official Source
128+
129+
```sh
130+
# Install the latest version
131+
curl https://rclone.org/install.sh | sudo bash
132+
133+
# Or download a specific version from https://rclone.org/downloads/
134+
```
135+
136+
### Install via Package Manager
137+
138+
```sh
139+
# macOS (via Homebrew)
140+
brew install rclone
141+
142+
# Amazon Linux 2023
143+
sudo dnf install rclone
144+
145+
# Ubuntu/Debian
146+
sudo apt install rclone
147+
```
148+
149+
**Note:** After installation, the binary is typically in `/usr/bin/rclone` or `/usr/local/bin/rclone`
150+
151+
```sh
152+
# Verify installation
153+
rclone version
154+
```
155+
156+
### Configuration
157+
158+
rclone uses standard AWS credentials and configuration. Make sure you have:
159+
- AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
160+
- Appropriate S3 permissions for the bucket you're testing against
161+
162+
**rclone Config File:** The runner automatically creates a temporary rclone configuration file internally. No manual configuration is needed.
163+
164+
#### Config File Options
165+
166+
The runner creates a config file with the following settings (documented at https://rclone.org/s3/):
167+
168+
```ini
169+
[remote]
170+
type = s3 # S3 backend type
171+
provider = AWS # Use AWS S3
172+
env_auth = true # Get credentials from environment
173+
region = us-west-2 # AWS region (from REGION command-line argument)
174+
no_check_bucket = true # Don't check if bucket exists or try to create it
175+
directory_bucket = true # Enable S3 Express (automatically added for S3 Express buckets)
176+
```
177+
178+
The region is set in the config file from the REGION command-line argument, ensuring rclone operates in the correct AWS region.
179+
180+
#### Command-Line Options
181+
182+
The runner automatically configures these rclone flags based on the workload:
183+
184+
1. **Parallel File Transfers** ([docs](https://rclone.org/docs/#transfers-n)):
185+
- `--transfers <n>`
186+
187+
- Number of file transfers to run in parallel (important for multiple small files)
188+
- Formula: `concurrency = target_throughput_Gbps / 0.4`
189+
190+
- Example: 100 Gbps → 250 parallel transfers
191+
192+
2. **Upload Concurrency** ([docs](https://rclone.org/s3/#s3-upload-concurrency)):
193+
- `--s3-upload-concurrency <n>`
194+
195+
- Controls concurrent chunks for multipart uploads (for large files)
196+
- Formula: `concurrency = target_throughput_Gbps / 0.4`
197+
198+
- Example: 100 Gbps → 250 concurrent operations
199+
200+
3. **Download Parallelism** ([docs](https://rclone.org/docs/#multi-thread-streams-int)):
201+
- `--multi-thread-streams <n>`
202+
203+
- Controls parallel streams for downloads (for large files)
204+
- Formula: `concurrency = target_throughput_Gbps / 0.4`
205+
206+
- Example: 100 Gbps → 250 parallel streams
207+
208+
4. **Always Transfer Files** ([docs](https://rclone.org/docs/#ignore-times)):
209+
- `--ignore-times`
210+
211+
- Forces rclone to always transfer files, don't skip based on timestamps
212+
- Essential for benchmarking to ensure consistent measurements across runs
213+
214+
5. **Checksum Control** ([docs](https://rclone.org/s3/#s3-disable-checksum)):
215+
- `--s3-disable-checksum`
216+
217+
- Automatically used when no checksum is specified in workload
218+
- Workloads requiring specific checksums will skip (rclone only supports MD5)
219+
220+
6. **S3 Express Support**:
221+
- Automatically detects S3 Express buckets (ending with `--x-s3` )
222+
- Adds `directory_bucket = true` to config file
223+
- See [S3 Directory Bucket documentation](https://rclone.org/s3/#s3-directory-bucket)
224+
225+
**Note:** This benchmark configures concurrency dynamically to ensure Apple to Apple comparison with other clients.

runners/s3-benchrunner-3p/main.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
PARSER = argparse.ArgumentParser(
1616
description='Third-party S3 client benchmark runner. Supports various third-party S3 clients.')
1717
PARSER.add_argument('EXECUTABLE_PATH', help='Path to the S3 client executable')
18-
PARSER.add_argument('S3_CLIENT', choices=('s5cmd',), help='S3 client to use')
18+
PARSER.add_argument('S3_CLIENT', choices=(
19+
's5cmd', 'rclone'), help='S3 client to use')
1920
PARSER.add_argument('WORKLOAD')
2021
PARSER.add_argument('BUCKET')
2122
PARSER.add_argument('REGION')
@@ -28,6 +29,9 @@ def create_runner(config: BenchmarkConfig, s3_client: str, executable_path: str)
2829
if s3_client == 's5cmd':
2930
from runner.s5cmd import S5cmdBenchmarkRunner
3031
return S5cmdBenchmarkRunner(config, executable_path)
32+
elif s3_client == 'rclone':
33+
from runner.rclone import RcloneBenchmarkRunner
34+
return RcloneBenchmarkRunner(config, executable_path)
3135
else:
3236
raise ValueError(f'Unknown S3 client: {s3_client}')
3337

runners/s3-benchrunner-3p/runner/__init__.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import math
55
import os
66
from pathlib import Path
7+
import shutil
78
import sys
89

910

@@ -108,13 +109,21 @@ def prepare_run(self):
108109
"""Do preparation work between runs, before the timer starts."""
109110
self._verbose('preparing run...')
110111
for task in self.config.tasks:
112+
task_path = Path(task.key)
113+
111114
if task.action == 'download':
112-
task_path = Path(task.key)
113115
if task_path.exists():
114-
# Before downloading, clean up any pre-existing files.
116+
# Before downloading, clean up any pre-existing files or directories.
115117
# CLI and boto3 download to a tmp filename, then rename to the final filename.
116-
# The rename is way slower if it's replacing an existing file.
117-
task_path.unlink()
118+
# The rename is way faster if it's not replacing an existing file.
119+
# rclone will treat an existing directory as a target directory and place
120+
# the file inside it (creating path/to/file/file instead of path/to/file).
121+
if task_path.is_dir():
122+
self._verbose(
123+
f'removing existing directory: {task_path}')
124+
shutil.rmtree(task_path)
125+
else:
126+
task_path.unlink()
118127
elif not task_path.parent.exists():
119128
task_path.parent.mkdir(parents=True)
120129

0 commit comments

Comments
 (0)