-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Background
Feature Branch: https://github.com/ga4gh/data-repository-service-schemas/tree/feature/issue-394-drs-plus-connect-docs-v1
I'm opening this issue based on followup to the April 20th, 2023 GA4GH Connect meeting "DRS and Data Connect" session. This session looked at exploring how standards from the Cloud and Discovery work streams can be used together to identify the two needs identified in the aims listed below:
- Address the need to obtain additional data about a DRS object
- Revisit how Data Connect handles the need for bundles
Some resources of interest:
- Original issue for extra metadata in DRS: adding/standardizing field providing reference to external metadata #336
- Pull request - intent is to close, still has useful for review of requirements: Feature/issue 343 metadata endpoint #390
- DRS and metadata exploration to revisit
Key Takeaways from GA4GH Connect
Metadata + DRS
We agreed that best practices for working with metadata were important, and largely agreed on two guiding principles:
-
- DRS doesn’t know about metadata, and shouldn’t. Instead, we should lean into the fact that systems that use DRS typically have some database-like component that does know about object metadata.
-
- No new APIs (or API changes for DRS) are needed. Instead, we should add an appendix to the DRS spec documenting best practices for building systems that use DRS and care about metadata.
Compound Objects
We agreed with the way the DRS 1.3.0 develop branch frames the need for compound object support:
- Some content (e.g. DICOM images) is best represented as a compound object consisting of a structured collection of atomic DrsObjects.
- Each compound object should have a DRS ID, that clients can use to retrieve the object structure and its constituent atomic objects.
We discussed two possible ways to represent and retrieve compound object contents, but didn’t have time to discuss their tradeoffs: -
- The approach documented in the develop branch (Best Practice: Manifests), where the compound object’s DRS ID provides access to a manifest file listing the object contents. Manifest format is datatype-specific and outside the scope of the DRS spec (but could for example be a JSON file).
-
- An alternate approach where the compound object’s DRS ID provides access to a Data Connect table describing the object contents. Table format is datatype-specific and outside the scope of the DRS spec.
Goal for this Issue
This issue is to give us a place to discuss the use of Data Connect and DRS together (and link PRs to). The immediate goal of this Issue is to get a corresponding PR that addresses the best practice of using Data Connect together with DRS to provide 1) more metadata about DRS objects and 2) a scalable alternative to bundles. The intention is a documentation only change with a best practice appendix to the DRS spec.