-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
Parent issue: #1191
The following four bulk-only samples are missing from the metadata-only downloads, including both the portal-wide metadata and each project's single_cell_metadata.tsv files:
- SCPCP000009;SCPCS000129;SCPCL000166
- SCPCP000006;SCPCS000210;SCPCL000271
- SCPCP000006;SCPCS000211;SCPCL000284
- SCPCP000017;SCPCS000606;SCPCL001182
Although these samples are included in dedicated bulk metadata TSV files, they should also be represented in the metadata-only download files.
Problem or idea
The root cause is that we currently handle bulk libraries for bulk-only samples differently during the metadata loading process. During this process, we create Library objects in the database using *_metadata.json files stored in the S3 input bucket. However, libraries for bulk-only samples do not have associated *_metadata.json files, which results in no Library objects being created for these samples.
When generating metadata-only TSV files, we rely on querying libraries from the database. Since libraries for bulk-only samples do not exist in the database, they are excluded from the output files.
We can resolve this by creating bulk Library objects during metadata loading to ensure that all libraries, regardless of modality, are represented in the metadata-only files.
Solution or next step
-
Implement a new method
Project::load_bulk_librariesthat creates bulk library objects during metadata loading usig the downloadedbulk_metadata.tsvfile from the S3 input bucket withcsv.DictReader -
Update
Project.get_librariesto include bulk libraries when themetadata_onlyflag is set toTrue, or if the project includes bulk data forsingle_cell_metadata.tsv -
Add tests to verify the implementation work as expected