Skip to content

h5ad to BAM matching #42

@davek44

Description

@davek44

Hi, thanks for an incredible resource! I'm trying to relate the Smart-seq BAMs to the cell annotations, but encountering problems. Namely, the gene expression vectors from the fully processed TabulaSapiens.h5ad frequently seem to not correspond to the BAMs for the matched names.

For example, cell row 448105 in the h5ad has index B107919_H10_S31.homo.gencode.v30.ERCC.chrM.
That appears to match s3://czb-tabula-sapiens/Pilot1/alignment-gencode/SS2/B107919_H10_S31.homo.gencode.v30.ERCC.chrM.Aligned.out.sorted.bam.

The top expressed genes in the h5ad via 'raw_counts' correspond to FTL, GPX1, TSP, PFN1, etc. However, none of those genes have aligned reads in the BAM file.

In the AWS bucket, there is a Pilot1 count table s3://czb-tabula-sapiens/Pilot1/smartseq2_gene_count_tables/pilot/190627_A00111_0335_BHLMG5DSXX.csv.
The top expressed genes in this table for cell 'B107919_H10_S31.homo' have many aligned reads in the BAM, as expected.

The expression vectors from the bucket CSV versus h5ad for this cell have SpearmanR 0.08, and a scatter plot indicates they do not match.

Could you help me understand where I'm going wrong here? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions