Skip to content

MinIO cleanup #1

@sthapa

Description

@sthapa

Background

ServiceX currently doesn't cleanup the results from a transform request. To work effectively in production, we need to add the ability for ServiceX to cleanup after itself. Initially this will focus on MinIO persistent storage but should be general enough that we can easily extend this to other storage solutions in the future.

MinIO's builtin facilities don't help with this. Bucket lifecycle policies will delete contents of a bucket after a fixed time. FIFO quota policies will delete older objects in order to free space for newer objects when the quota is reached. Both methods delete objects without informing ServiceX

Proposed Solution

A solution for this would be to create a microservice that handles tracking objects stored in persistent storage and deleting them as needed in order to keep storage utilization under a specified quota.

Tracking Storage Utilization

MinIO doesn't have any easy way to track the space used by a bucket. Consequently, it requires some effort to even get the space used by all the transformation results. The current best practice is to iterate over objects within a bucket and then sum up the space used by each object.

Given the ServiceX workflow, we can have this service scan the MinIO storage once a day and get the bucket sizes for any new buckets. Since the outputs of a transform are immutable, we can store this information within the postgresql database and only need to scan a bucket once to get it's size.

Deletion Policy

The service can initially use a policy of deleting the oldest buckets until the storage used is under a configurable threshold in order to ensure that transforms don't run out of space. The service should probably default to a high water mark of 85% but this will be configurable.

API Interface

The service will have a single endpoint that exposes a simple API that should suffice for ServiceX activities:

  • GET /transform_data/size?id=ID&type=minio will scan a specified bucket and return the size of that bucket
  • DELETE /transform_data?id=ID&type=minio will attempt to delete a specified bucket

Metadata

Metadata

Labels

epicProject epics

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions