Skip to content

Include missing bulk-only samples in metadata-only downloads #1235

@nozomione

Description

@nozomione

Context

Parent issue: #1191

The following four bulk-only samples are missing from the metadata-only downloads, including both the portal-wide metadata and each project's single_cell_metadata.tsv files:

  • SCPCP000009;SCPCS000129;SCPCL000166
  • SCPCP000006;SCPCS000210;SCPCL000271
  • SCPCP000006;SCPCS000211;SCPCL000284
  • SCPCP000017;SCPCS000606;SCPCL001182

Although these samples are included in dedicated bulk metadata TSV files, they should also be represented in the metadata-only download files.

Problem or idea

The root cause is that we currently handle bulk libraries for bulk-only samples differently during the metadata loading process. During this process, we create Library objects in the database using *_metadata.json files stored in the S3 input bucket. However, libraries for bulk-only samples do not have associated *_metadata.json files, which results in no Library objects being created for these samples.

When generating metadata-only TSV files, we rely on querying libraries from the database. Since libraries for bulk-only samples do not exist in the database, they are excluded from the output files.

We can resolve this by creating bulk Library objects during metadata loading to ensure that all libraries, regardless of modality, are represented in the metadata-only files.

Solution or next step

  1. Implement a new method Project::load_bulk_libraries that creates bulk library objects during metadata loading usig the downloaded bulk_metadata.tsv file from the S3 input bucket withcsv.DictReader

  2. Update Project.get_libraries to include bulk libraries when the metadata_only flag is set to True, or if the project includes bulk data for single_cell_metadata.tsv

  3. Add tests to verify the implementation work as expected

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions