Skip to content

Metadata feedback from a single-cell integration perspective: batch information, sample pooling and more #1

@LuckyMD

Description

@LuckyMD

Hi!

I wanted to share some feedback on the metadata suggestions. I have several questions about unclear aspects and some suggestions for extension.

Notes/questions:

  • instrument metadata: I may be naive here, but is this needed? What info does this provide that would affect the data beyond what you can identify from the data itself (e.g., reads per sample)
  • Sequencing protocol and library description seem very similar. It would be helpful to clarify the differences.
  • Does study-id include metadata on which service the ID is registered with?
  • Library id or experiment ID: accession in which database? Can authors come up with their own ID?
  • Library-extract-id: libraries can consist of many pooled samples, and there can be many sample pools with the same samples. Could this library-extract-id also be a specific sample subset of a library pool of multiple samples? It would then be a subset of a sample, if this sample was pooled differently multiple times.
  • is sc vs sn part of the assay_type or the library description?

Missing:

  • IMO the most important unknown is batch structure when you download existing data: could be a specific field with open text or a list of fields that often confer batch effects and an additional open text field
  • What about alignment information? Are all datasets that this refers to unaligned? Otherwise genome version and gene annotations as well as alignment software settings would be quite useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions