Skip to content

Commit cee0130

Browse files
authored
Merge pull request #6078 from Swathi266/patch-11
Update tutorial.md
2 parents e1ed728 + 9ff3f1a commit cee0130

File tree

1 file changed

+8
-8
lines changed
  • topics/proteomics/tutorials/DIA_Analysis_OSW

1 file changed

+8
-8
lines changed

topics/proteomics/tutorials/DIA_Analysis_OSW/tutorial.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Over the last decade another acquisition method has been developed addressing th
4242

4343
Therefore, all peptides which are present in the same m/z window at the same time are fragmented simultaneously and a MS2 spectra containing fragments from multiple peptides is acquired. Using the same m/z windows for all measurements, results in more reproducible fragmentation and potential identification across multiple measurements.
4444
However, the resulting MS2 spectra contain fragments from multiple peptides and are often more complex and do not allow to directly link a specific (m/z) mass from the MS1 to a single MS2 fragment spectra.
45-
![DIA_vs_DDA](../../images/DIA_analysis_MS2.png "The MS2 scans in the DIA approach contain fragment ions from multiple precursers and are therefore more complex than the precursor-specific MS2 scans in DDA.")
45+
![DIA_vs_DDA](../../images/DIA_analysis_MS2.png "The MS2 scans in the DIA approach contain fragment ions from multiple precursors and are therefore more complex than the precursor-specific MS2 scans in DDA.")
4646

4747
To allow for the identification of peptides in those ambiguous MS2 spectra, a spectral library can be used. The spectral library contains experimentally measured MS2 spectra, which are specific for one precursor (from previous DDA measurements). In more recent approaches the MS2 spectra can be predicted based on theoretical peptide sequences (e.g. from a protein database).
4848
![DIA_basics](../../images/DIA_analysis_basic.png "Spectral libraries are necesseary for the identification of peptides in DIA MS2 scans. In this example the spectral library is generated based on DDA data from the same samples.")
@@ -69,7 +69,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
6969
>
7070
> {% snippet faqs/galaxy/histories_create_new.md %}
7171
>
72-
> 2. Import the fasta and raw files as well as the sample annotation and the iRT Transition file from [Zenodo](https://zenodo.org/record/4307762)
72+
> 2. Import the fasta and raw files as well as the sample annotation and the iRT Transition file from [Zenodo](https://zenodo.org/record/4307762) iRT Transition file contains information about the transitions of the Indexed Retention Time (iRT) standard peptides. These peptides are a set of synthetic peptides with well-defined and stable retention times across different liquid chromatography-mass spectrometry (LC-MS) systems.
7373
> ```
7474
> https://zenodo.org/record/4307762/files/HEK_Ecoli_lib.pqp
7575
> https://zenodo.org/record/4307762/files/iRTassays.tsv
@@ -150,7 +150,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
150150
> - *"Optional outputs"*: `out_osw`
151151
>
152152
> > <comment-title>Mass tolerances and "Minimal number of bins required to be covered"</comment-title>
153-
> >Here we analyze data acquired on a QExactive Plus MS instrument which uses an Orbitrap and generates high resolution data. Therefore, we allow for 10 ppm mass tolerance for both the MS1 and the MS2 level. If larger mass deviation are expected the mass tolerances can be adjusted. Other instrumentation (such as TOF devices) might require larger mass tolerances for improved peptide identification. Furthermore, here we require at least 7 of the iRT peptides to be found in each of the DIA measurements. This number can be set to lower values if for some reasons fewer iRT peptides were found in some of the measurements. In case only a few iRT peptides are identified in the DIA measurements, the mass tolerance for the iRT extraction can be increased to 20 ppm. We than recommend to increase the extraction window for the MS2 level to 20 ppm. For more information see also [OpenSwathWorkflow](http://openswath.org/en/latest/docs/openswath.html).
153+
> >Here we analyze data acquired on a QExactive Plus MS instrument which uses an Orbitrap and generates high resolution data. Therefore, we allow for 10 ppm mass tolerance for both the MS1 and the MS2 level. If larger mass deviation are expected the mass tolerances can be adjusted. Other instrumentation (such as TOF devices) might require larger mass tolerances for improved peptide identification. Furthermore, here we require at least 7 of the iRT peptides to be found in each of the DIA measurements. This number can be set to lower values if for some reasons fewer iRT peptides were found in some of the measurements. In case only a few iRT peptides are identified in the DIA measurements, the mass tolerance for the iRT extraction can be increased to 20 ppm. We then recommend to increase the extraction window for the MS2 level to 20 ppm. For more information see also [OpenSwathWorkflow](http://openswath.org/en/latest/docs/openswath.html).
154154
> {: .comment}
155155
>
156156
{: .hands_on}
@@ -176,7 +176,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
176176
> - *"Either a 'LDA' or 'XGBoost' classifier is used for semi-supervised learning"*: `XGBoost`
177177
>
178178
> > <comment-title>FDR scoring using pyprophet score</comment-title>
179-
> >During this step q-values corresponding to the FDR of peak identification is estimated with pyprophet. Typically this is the most time consuming step due to the involved maschine learning processes. To decrease the input size one can use **PyProphet subsample** to randomly select subsets of the identifications from each run in the merged.osw (**PyProphet merge** output). In this case, the FDR estimation needs to be applied on the full merged.osw afterwards using the scored subsample.osw in the *"Apply PyProphet score weights file (osw format) instead of semi-supervised learning."* section of **PyProphet score**. The generated report.pdf is helpful to identify potential errors as well as get first insights on the quality of the identifications.
179+
> >During this step q-values corresponding to the FDR of peak identification is estimated with pyprophet. Typically this is the most time consuming step due to the involved machine learning processes. To decrease the input size one can use **PyProphet subsample** to randomly select subsets of the identifications from each run in the merged.osw (**PyProphet merge** output). In this case, the FDR estimation needs to be applied on the full merged.osw afterwards using the scored subsample.osw in the *"Apply PyProphet score weights file (osw format) instead of semi-supervised learning."* section of **PyProphet score**. The generated report.pdf is helpful to identify potential errors as well as get first insights on the quality of the identifications.
180180
> {: .comment}
181181
>
182182
{: .hands_on}
@@ -198,7 +198,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
198198
>
199199
> > <solution-title></solution-title>
200200
> >
201-
> > 1. Yes, we can see a clearly different distribution of the target identification and the decoys. Both, target and decoy distribution were highest around 0. However, the target distribution shows a second peak at positiv d-score values.
201+
> > 1. Yes, we can see a clearly different distribution of the target identification and the decoys. Both, target and decoy distribution were highest around 0. However, the target distribution shows a second peak at positive d-score values.
202202
> > 2. The decoy identifications show a Gaussian distribution around 0 which could be explained by the fact that the decoy sequences were randomly generated alterations from the target sequences in the spectral library (see [DIA library generation tutorial]({{site.baseurl}}/topics/proteomics/tutorials/DIA_lib_OSW/tutorial.html)). Most target identifications show also d-scores around 0, thus reflect potential false positive identifications. Only the distribution of target identifications shows a second increase in higher d-score values, representing more confident identifications.
203203
> >
204204
> {: .solution}
@@ -269,12 +269,12 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
269269
270270
> <question-title></question-title>
271271
>
272-
> 1. How many different peptides and proteins were identified and quatified?
272+
> 1. How many different peptides and proteins were identified and quantified?
273273
> 2. Could you already tell from the summary which Spike-in contained higher amounts of Ecoli peptides?
274274
>
275275
> > <solution-title></solution-title>
276276
> >
277-
> > 1. In total, over 27,300 peptides and over 5,100 proteins were identified and quantified in the DIA measurements.
277+
> > 1. In total, over 28,041 peptides and over 5,056 proteins were identified and quantified in the DIA measurements.
278278
> > 2. No, the summary mainly provides an overview of the identifications in each individual DIA measurement as well as some descriptive statistics such as CVs and correlations.
279279
> >
280280
> {: .solution}
@@ -295,7 +295,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
295295
> 2. Can you guess which Spike-in contains higher amounts of Ecoli peptides?
296296
>
297297
> > <solution-title></solution-title>
298-
> > 1. Over 800 Ecoli proteins were identified and quantified in the six DIA measurements.
298+
> > 1. Over 817 Ecoli proteins were identified and quantified in the six DIA measurements.
299299
> > 2. It seems that the samples in Spike_in_2 contained higher amounts of Ecoli peptides than the samples in Spike_in_1.
300300
> {: .solution }
301301
{: .question}

0 commit comments

Comments
 (0)