-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Catherine Kaczorowski is going to be uploading mass spec proteomics:
Level 0 ... Raw data
Level 1 ... Skyline documents
Level 2 ... CSV of raw peptide intensities from Skyline
Level 3A ... CSV of Normalized and batch corrected peptide intensities
Level 3B ... CSV of Normalized and batch corrected protein group intensities
Per Michael MacCoss, the proteomics lab PI:
This is data independent acquisition (DIA). TMT has 10-16 samples in each run ... but they then have ~10 runs per "plex". So it makes the meta data linking challenging. Each of a TMT "plex" is a batch.
We have 1 sample = 1 run. The samples are prepared in batches of 16 samples with 14 actual samples and 2 controls. However each run has one sample.
Here are some details of how our data is collected. https://pubmed.ncbi.nlm.nih.gov/32312845/ ... it is probably too much detail but just incase you are interested.
A major promise of DIA data is that someone could go back to the RAW data and find something novel after the fact. Here is an example of someone finding something very novel in AD from a human dataset we had that we never considered. https://pubmed.ncbi.nlm.nih.gov/34818016/
A strength of TMT is that many samples are run together. A major challenge of TMT is that the same peptides are rarely sampled between batches. https://www.mcponline.org/article/S1535-9476(20)31525-5/fulltext
So it becomes a major challenge to report peptide level data using TMT.
So we do want to try and capture some of the batch information. As it would be useful for someone to perform their own batch correction.
Some info about Keys:
And platform will be Orbitrap Fusion Lumos.
For the Control Type ... we have some internal and external controls. We add several peptides and a protein to each sample that we use as part of the QC process. So those will be part of the data matrix.
We also have between 1-2 samples that are in each batch. So that means the control samples are prepared and run many times. I think for sheet 1 we should add a column for the batch ID.
I'm also a bit confused by the FDR. We probably need a more specific definition of what the FDR threshold is testing. We use many different thresholds (generally done on the q-values) for peptide detection and can also do it for significance testing. We don't do a protein group level q-value ... happy to explain why.
Other info
We just need to make sure that we have the identifiers for the proteomics data and the animals/biospecimens well linked. Any data we might have is something we got from Catherine. It is the best way to minimize errors.