The Object Storage Maintenance Tool is a command-line utility designed to archive and compress objects stored in object storage in a streaming manner. It helps manage storage efficiently by gathering objects older than a particular date and storing them in a single TAR archive, optionally compressing them with XZ.
In object storage environments, a large number of small objects (e.g., audit logs, event records) can lead to inefficient storage use. For example:
- Many objects are tiny (100 bytes or even 0 bytes) but still occupy 4KB due to storage overhead.
- Storing such objects as a TAR archive reduces storage overhead.
- Compressing the archive with XZ further optimizes storage usage.
- Archive Objects: Consolidates multiple objects into a single TAR file.
- Streaming Compression: Uses XZ to reduce storage footprint.
- Efficient Storage Management: Helps save costs by reducing wasted space.
- S3-Compatible: Works with AWS S3 compatible object storages.
- Rust (latest stable version)
cargo
package manager
make build
or
cargo build --release
make release
The binary will be located at target/release/object-storage-maintenance
.
Set the environment variables for S3 client:
AWS_REGION="eu-north-1"
AWS_ACCESS_KEY=
AWS_SECRET_KEY=
Note: AWS_REGION
defaults to us-east-1
.
Set the object storage endpoint if you are using a non-standard S3 storage location:
OBJECT_STORAGE_ENDPOINT="https://my-storage.company.com:9000"
Run the tool with the archive
command to move and compress objects:
object-storage-maintenance archive \
--src s3://project/audit/ \
--dst s3://archive/audit/ \
--cutoff 2025-01-01T00:00:00+00:00 \
--buffer 104857600 \
--compression best
Argument | Description | Required |
---|---|---|
--src |
Source bucket and prefix containing the objects to archive. | ☑ |
--dst |
Destination bucket and prefix where the archive will be stored. | ☑ |
--cutoff |
Cutoff timestamp in ISO format. | |
--buffer |
Buffer size in bytes (default: 104857600 = 100MB) | |
--compression |
Compression level "fastest" or "best" (default: fastest) |
- Keep in mind that AWS S3 multipart upload allows up to 10,000 parts. Since maximum total object size is 5TB - make sure your part (buffer) size multiplied by 10,000 fits into 5TB. Buffer size is being defaulted to 100MB since it's a best practice to use multipart upload for objects that are 100 MB or larger instead of uploading them in a single operation.
- If cutoff is not being passed - all the objects will be archived.
- Best compression level is memory hungry (up to ~1GB), but it does its job pretty well.
docker run --rm --env-file .env ghcr.io/bixority/object-storage-maintenance:v0.0.2 \
archive \
--src s3://project/audit/ \
--dst s3://archive/audit/ \
--cutoff 2025-01-01T00:00:00+00:00 \
--buffer 104857600 \
--compression best
There is intentionally no :latest
tag so there are no surprises after seamless upgrade.
Imagine you have millions of tiny log files stored in s3://project/audit/
:
- Each object is 100 bytes but takes 4KB.
- You can archive them into a single TAR file.
- Compress the archive with XZ to save additional space.
After running the tool, the tar.xz archive is stored in s3://archive/audit/
, significantly reducing storage
costs.
GPL-3.0 License. See LICENSE
for details.
Feel free to submit issues and pull requests!
Maintained by Olegs Korsaks / Bixority SIA.