-
Notifications
You must be signed in to change notification settings - Fork 54
Description
It has been suggested we work on support for cold storage in the DRS specification. Ahead of submitting a pull request for this. It seems worth laying out some of the considerations for support of cold storage. Not all considerations need be accommodated, but it helps to have an idea of the overall landscape.
'Cold' storage is a shorthand for the situation where an object is not immediately available to a 'get a URL' DRS request ( a request of the form /objects/<drs_id>/access/<access_id>). Hot/Cold is binary. The different storage tiers offered by many providers have more subtle gradations of availability. However for this discussion we will assume that hot/cold is sufficient, unless someone suggests it needs to be more complex.
- Indicate that a given access method is to cold/offline/less available storage
A given DRS Object may have access methods for both hot and cold storage i.e. the data model is that hot/cold is an attribute of an Access Method not of a DRS Object.
e.g. something like the following pseudo-specification is likely necessary. The first and third access methods are hot, and the second is cold.
{
"access_methods": [
{
"access_id": "e93724",
"region": "ncbi",
"type": "https"
},
{
"access_id": "fbd466",
"region": "gs.US",
"type": "https",
"storage": "cold"
},
{
"access_id": "0da151",
"region": "s3.us-east-1",
"type": "https"
}
],
"checksums": [
{
"checksum": "044e759c2e430c3db049392b181f6f5a",
"type": "md5"
}
],
"created_time": "2022-06-03T14:07:27Z",
"id": "044e759c2e430c3db049392b181f6f5a",
"name": "SRR000066.lite",
"self_url": "drs://locate.ncbi.nlm.nih.gov/044e759c2e430c3db049392b181f6f5a",
"size": 118588310
}
-
DRS should provide the mechanism for a client to request that the copy of the object at a specified access method be 'thawed'.
-
Having asked for an object to be thawed, the client should be notified, or be able to check, whether the object is now available in immediately available (hot) storage.
-
At the point when the object is available hot, the client would obtain the URL using the existing DRS method to get the URL.
-
All the above should be possible in 'bulk mode' i.e. a client would want to make a single request for some large number of objects to be thawed. One likely use case is that a whole study or collection is in cold storage, and one would want to request all objects.
As with other bulk approaches DRS would concern itself only with lists of objects to be thawed - not what the external entity, like study, that those lists represent. -
Costs for thawing data and the cost for the hot storage in which the thawed objects are temporarily held will need to be discussed as part of the spec.
e.g. which accounts would be used, and how the necessary account details and authorization are passed. There is likely much in common on these questions as for the billing use case for egress charges.