Skip to content

tempo-mpgen column names inconsistent with schema #1068

@anoronh4

Description

@anoronh4

I have looked into a couple of column names and found that they change a bit from IGO to SMILE to tempo-mpgen and just wanted clarify and get it on your radar.

IGO SMILE tempo-mpgen python Voyager sample_tracker.txt Example value
cmoSampleName sampleType sample_class sampleType not shown Adjacent Tissue
specimenType sampleClass specimen_type sampleClass Sample_Class_(T/N) and sampleClass RapidAutopsy
tumorOrNormal tumorOrNormal tumorOrNormal tumorOrNormal tumorOrNormal Tumor

One issue with the first item is that this column is used in pairing to define whether a sample is normal or tumor, but it is not available in table form for downstream inspection or displaying in the tracker. tumorOrNormal is included in the sample_tracker.txt, but not used at all for pairing. This is creating some confusion with PMs when trying to debug.

I found the names used in tempo-mpgen code here:
https://github.com/mskcc/beagle/blob/master/runner/operator/tempo_mpgen_operator/bin/tempo_sample.py#L30-L31
Although voyager's names match the Schema v2.0, i found that using different names in beagle code made it more difficult to trace how samples are being organized with tempo-mpgen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions