You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: topics/proteomics/tutorials/DIA_Analysis_OSW/tutorial.md
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ Over the last decade another acquisition method has been developed addressing th
42
42
43
43
Therefore, all peptides which are present in the same m/z window at the same time are fragmented simultaneously and a MS2 spectra containing fragments from multiple peptides is acquired. Using the same m/z windows for all measurements, results in more reproducible fragmentation and potential identification across multiple measurements.
44
44
However, the resulting MS2 spectra contain fragments from multiple peptides and are often more complex and do not allow to directly link a specific (m/z) mass from the MS1 to a single MS2 fragment spectra.
45
-

45
+

46
46
47
47
To allow for the identification of peptides in those ambiguous MS2 spectra, a spectral library can be used. The spectral library contains experimentally measured MS2 spectra, which are specific for one precursor (from previous DDA measurements). In more recent approaches the MS2 spectra can be predicted based on theoretical peptide sequences (e.g. from a protein database).
48
48

@@ -69,7 +69,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
> 2. Import the fasta and raw files as well as the sample annotation and the iRT Transition file from [Zenodo](https://zenodo.org/record/4307762)
72
+
> 2. Import the fasta and raw files as well as the sample annotation and the iRT Transition file from [Zenodo](https://zenodo.org/record/4307762) iRT Transition file contains information about the transitions of the Indexed Retention Time (iRT) standard peptides. These peptides are a set of synthetic peptides with well-defined and stable retention times across different liquid chromatography-mass spectrometry (LC-MS) systems.
@@ -150,7 +150,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
150
150
> - *"Optional outputs"*: `out_osw`
151
151
>
152
152
> > <comment-title>Mass tolerances and "Minimal number of bins required to be covered"</comment-title>
153
-
> >Here we analyze data acquired on a QExactive Plus MS instrument which uses an Orbitrap and generates high resolution data. Therefore, we allow for 10 ppm mass tolerance for both the MS1 and the MS2 level. If larger mass deviation are expected the mass tolerances can be adjusted. Other instrumentation (such as TOF devices) might require larger mass tolerances for improved peptide identification. Furthermore, here we require at least 7 of the iRT peptides to be found in each of the DIA measurements. This number can be set to lower values if for some reasons fewer iRT peptides were found in some of the measurements. In case only a few iRT peptides are identified in the DIA measurements, the mass tolerance for the iRT extraction can be increased to 20 ppm. We than recommend to increase the extraction window for the MS2 level to 20 ppm. For more information see also [OpenSwathWorkflow](http://openswath.org/en/latest/docs/openswath.html).
153
+
> >Here we analyze data acquired on a QExactive Plus MS instrument which uses an Orbitrap and generates high resolution data. Therefore, we allow for 10 ppm mass tolerance for both the MS1 and the MS2 level. If larger mass deviation are expected the mass tolerances can be adjusted. Other instrumentation (such as TOF devices) might require larger mass tolerances for improved peptide identification. Furthermore, here we require at least 7 of the iRT peptides to be found in each of the DIA measurements. This number can be set to lower values if for some reasons fewer iRT peptides were found in some of the measurements. In case only a few iRT peptides are identified in the DIA measurements, the mass tolerance for the iRT extraction can be increased to 20 ppm. We then recommend to increase the extraction window for the MS2 level to 20 ppm. For more information see also [OpenSwathWorkflow](http://openswath.org/en/latest/docs/openswath.html).
154
154
> {: .comment}
155
155
>
156
156
{: .hands_on}
@@ -176,7 +176,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
176
176
> - *"Either a 'LDA' or 'XGBoost' classifier is used for semi-supervised learning"*: `XGBoost`
177
177
>
178
178
> > <comment-title>FDR scoring using pyprophet score</comment-title>
179
-
> >During this step q-values corresponding to the FDR of peak identification is estimated with pyprophet. Typically this is the most time consuming step due to the involved maschine learning processes. To decrease the input size one can use **PyProphet subsample** to randomly select subsets of the identifications from each run in the merged.osw (**PyProphet merge** output). In this case, the FDR estimation needs to be applied on the full merged.osw afterwards using the scored subsample.osw in the *"Apply PyProphet score weights file (osw format) instead of semi-supervised learning."* section of **PyProphet score**. The generated report.pdf is helpful to identify potential errors as well as get first insights on the quality of the identifications.
179
+
> >During this step q-values corresponding to the FDR of peak identification is estimated with pyprophet. Typically this is the most time consuming step due to the involved machine learning processes. To decrease the input size one can use **PyProphet subsample** to randomly select subsets of the identifications from each run in the merged.osw (**PyProphet merge** output). In this case, the FDR estimation needs to be applied on the full merged.osw afterwards using the scored subsample.osw in the *"Apply PyProphet score weights file (osw format) instead of semi-supervised learning."* section of **PyProphet score**. The generated report.pdf is helpful to identify potential errors as well as get first insights on the quality of the identifications.
180
180
> {: .comment}
181
181
>
182
182
{: .hands_on}
@@ -198,7 +198,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
198
198
>
199
199
> > <solution-title></solution-title>
200
200
> >
201
-
> > 1. Yes, we can see a clearly different distribution of the target identification and the decoys. Both, target and decoy distribution were highest around 0. However, the target distribution shows a second peak at positiv d-score values.
201
+
> > 1. Yes, we can see a clearly different distribution of the target identification and the decoys. Both, target and decoy distribution were highest around 0. However, the target distribution shows a second peak at positive d-score values.
202
202
> > 2. The decoy identifications show a Gaussian distribution around 0 which could be explained by the fact that the decoy sequences were randomly generated alterations from the target sequences in the spectral library (see [DIA library generation tutorial]({{site.baseurl}}/topics/proteomics/tutorials/DIA_lib_OSW/tutorial.html)). Most target identifications show also d-scores around 0, thus reflect potential false positive identifications. Only the distribution of target identifications shows a second increase in higher d-score values, representing more confident identifications.
203
203
> >
204
204
> {: .solution}
@@ -269,12 +269,12 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
269
269
270
270
> <question-title></question-title>
271
271
>
272
-
> 1. How many different peptides and proteins were identified and quatified?
272
+
> 1. How many different peptides and proteins were identified and quantified?
273
273
> 2. Could you already tell from the summary which Spike-in contained higher amounts of Ecoli peptides?
274
274
>
275
275
> > <solution-title></solution-title>
276
276
> >
277
-
> > 1. In total, over 27,300 peptides and over 5,100 proteins were identified and quantified in the DIA measurements.
277
+
> > 1. In total, over 28,041 peptides and over 5,056 proteins were identified and quantified in the DIA measurements.
278
278
> > 2. No, the summary mainly provides an overview of the identifications in each individual DIA measurement as well as some descriptive statistics such as CVs and correlations.
279
279
> >
280
280
> {: .solution}
@@ -295,7 +295,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
295
295
> 2. Can you guess which Spike-in contains higher amounts of Ecoli peptides?
296
296
>
297
297
> > <solution-title></solution-title>
298
-
> > 1. Over 800 Ecoli proteins were identified and quantified in the six DIA measurements.
298
+
> > 1. Over 817 Ecoli proteins were identified and quantified in the six DIA measurements.
299
299
> > 2. It seems that the samples in Spike_in_2 contained higher amounts of Ecoli peptides than the samples in Spike_in_1.
0 commit comments