-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Background
ServiceX currently doesn't cleanup the results from a transform request. To work effectively in production, we need to add the ability for ServiceX to cleanup after itself. Initially this will focus on MinIO persistent storage but should be general enough that we can easily extend this to other storage solutions in the future.
MinIO's builtin facilities don't help with this. Bucket lifecycle policies will delete contents of a bucket after a fixed time. FIFO quota policies will delete older objects in order to free space for newer objects when the quota is reached. Both methods delete objects without informing ServiceX
Proposed Solution
A solution for this would be to create a microservice that handles tracking objects stored in persistent storage and deleting them as needed in order to keep storage utilization under a specified quota.
Tracking Storage Utilization
MinIO doesn't have any easy way to track the space used by a bucket. Consequently, it requires some effort to even get the space used by all the transformation results. The current best practice is to iterate over objects within a bucket and then sum up the space used by each object.
Given the ServiceX workflow, we can have this service scan the MinIO storage once a day and get the bucket sizes for any new buckets. Since the outputs of a transform are immutable, we can store this information within the postgresql database and only need to scan a bucket once to get it's size.
Deletion Policy
The service can initially use a policy of deleting the oldest buckets until the storage used is under a configurable threshold in order to ensure that transforms don't run out of space. The service should probably default to a high water mark of 85% but this will be configurable.
API Interface
The service will have a single endpoint that exposes a simple API that should suffice for ServiceX activities:
GET /transform_data/size?id=ID&type=miniowill scan a specified bucket and return the size of that bucketDELETE /transform_data?id=ID&type=miniowill attempt to delete a specified bucket