Skip to content

Update tutorial.md #6078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 15, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions topics/proteomics/tutorials/DIA_Analysis_OSW/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Over the last decade another acquisition method has been developed addressing th

Therefore, all peptides which are present in the same m/z window at the same time are fragmented simultaneously and a MS2 spectra containing fragments from multiple peptides is acquired. Using the same m/z windows for all measurements, results in more reproducible fragmentation and potential identification across multiple measurements.
However, the resulting MS2 spectra contain fragments from multiple peptides and are often more complex and do not allow to directly link a specific (m/z) mass from the MS1 to a single MS2 fragment spectra.
![DIA_vs_DDA](../../images/DIA_analysis_MS2.png "The MS2 scans in the DIA approach contain fragment ions from multiple precursers and are therefore more complex than the precursor-specific MS2 scans in DDA.")
![DIA_vs_DDA](../../images/DIA_analysis_MS2.png "The MS2 scans in the DIA approach contain fragment ions from multiple precursors and are therefore more complex than the precursor-specific MS2 scans in DDA.")

To allow for the identification of peptides in those ambiguous MS2 spectra, a spectral library can be used. The spectral library contains experimentally measured MS2 spectra, which are specific for one precursor (from previous DDA measurements). In more recent approaches the MS2 spectra can be predicted based on theoretical peptide sequences (e.g. from a protein database).
![DIA_basics](../../images/DIA_analysis_basic.png "Spectral libraries are necesseary for the identification of peptides in DIA MS2 scans. In this example the spectral library is generated based on DDA data from the same samples.")
Expand All @@ -69,7 +69,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
>
> {% snippet faqs/galaxy/histories_create_new.md %}
>
> 2. Import the fasta and raw files as well as the sample annotation and the iRT Transition file from [Zenodo](https://zenodo.org/record/4307762)
> 2. Import the fasta and raw files as well as the sample annotation and the iRT Transition file from [Zenodo](https://zenodo.org/record/4307762) iRT Transition file contains information about the transitions of the Indexed Retention Time (iRT) standard peptides. These peptides are a set of synthetic peptides with well-defined and stable retention times across different liquid chromatography-mass spectrometry (LC-MS) systems.
> ```
> https://zenodo.org/record/4307762/files/HEK_Ecoli_lib.pqp
> https://zenodo.org/record/4307762/files/iRTassays.tsv
Expand Down Expand Up @@ -150,7 +150,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
> - *"Optional outputs"*: `out_osw`
>
> > <comment-title>Mass tolerances and "Minimal number of bins required to be covered"</comment-title>
> >Here we analyze data acquired on a QExactive Plus MS instrument which uses an Orbitrap and generates high resolution data. Therefore, we allow for 10 ppm mass tolerance for both the MS1 and the MS2 level. If larger mass deviation are expected the mass tolerances can be adjusted. Other instrumentation (such as TOF devices) might require larger mass tolerances for improved peptide identification. Furthermore, here we require at least 7 of the iRT peptides to be found in each of the DIA measurements. This number can be set to lower values if for some reasons fewer iRT peptides were found in some of the measurements. In case only a few iRT peptides are identified in the DIA measurements, the mass tolerance for the iRT extraction can be increased to 20 ppm. We than recommend to increase the extraction window for the MS2 level to 20 ppm. For more information see also [OpenSwathWorkflow](http://openswath.org/en/latest/docs/openswath.html).
> >Here we analyze data acquired on a QExactive Plus MS instrument which uses an Orbitrap and generates high resolution data. Therefore, we allow for 10 ppm mass tolerance for both the MS1 and the MS2 level. If larger mass deviation are expected the mass tolerances can be adjusted. Other instrumentation (such as TOF devices) might require larger mass tolerances for improved peptide identification. Furthermore, here we require at least 7 of the iRT peptides to be found in each of the DIA measurements. This number can be set to lower values if for some reasons fewer iRT peptides were found in some of the measurements. In case only a few iRT peptides are identified in the DIA measurements, the mass tolerance for the iRT extraction can be increased to 20 ppm. We then recommend to increase the extraction window for the MS2 level to 20 ppm. For more information see also [OpenSwathWorkflow](http://openswath.org/en/latest/docs/openswath.html).
> {: .comment}
>
{: .hands_on}
Expand All @@ -176,7 +176,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
> - *"Either a 'LDA' or 'XGBoost' classifier is used for semi-supervised learning"*: `XGBoost`
>
> > <comment-title>FDR scoring using pyprophet score</comment-title>
> >During this step q-values corresponding to the FDR of peak identification is estimated with pyprophet. Typically this is the most time consuming step due to the involved maschine learning processes. To decrease the input size one can use **PyProphet subsample** to randomly select subsets of the identifications from each run in the merged.osw (**PyProphet merge** output). In this case, the FDR estimation needs to be applied on the full merged.osw afterwards using the scored subsample.osw in the *"Apply PyProphet score weights file (osw format) instead of semi-supervised learning."* section of **PyProphet score**. The generated report.pdf is helpful to identify potential errors as well as get first insights on the quality of the identifications.
> >During this step q-values corresponding to the FDR of peak identification is estimated with pyprophet. Typically this is the most time consuming step due to the involved machine learning processes. To decrease the input size one can use **PyProphet subsample** to randomly select subsets of the identifications from each run in the merged.osw (**PyProphet merge** output). In this case, the FDR estimation needs to be applied on the full merged.osw afterwards using the scored subsample.osw in the *"Apply PyProphet score weights file (osw format) instead of semi-supervised learning."* section of **PyProphet score**. The generated report.pdf is helpful to identify potential errors as well as get first insights on the quality of the identifications.
> {: .comment}
>
{: .hands_on}
Expand All @@ -198,7 +198,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
>
> > <solution-title></solution-title>
> >
> > 1. Yes, we can see a clearly different distribution of the target identification and the decoys. Both, target and decoy distribution were highest around 0. However, the target distribution shows a second peak at positiv d-score values.
> > 1. Yes, we can see a clearly different distribution of the target identification and the decoys. Both, target and decoy distribution were highest around 0. However, the target distribution shows a second peak at positive d-score values.
> > 2. The decoy identifications show a Gaussian distribution around 0 which could be explained by the fact that the decoy sequences were randomly generated alterations from the target sequences in the spectral library (see [DIA library generation tutorial]({{site.baseurl}}/topics/proteomics/tutorials/DIA_lib_OSW/tutorial.html)). Most target identifications show also d-scores around 0, thus reflect potential false positive identifications. Only the distribution of target identifications shows a second increase in higher d-score values, representing more confident identifications.
> >
> {: .solution}
Expand Down Expand Up @@ -269,12 +269,12 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma

> <question-title></question-title>
>
> 1. How many different peptides and proteins were identified and quatified?
> 1. How many different peptides and proteins were identified and quantified?
> 2. Could you already tell from the summary which Spike-in contained higher amounts of Ecoli peptides?
>
> > <solution-title></solution-title>
> >
> > 1. In total, over 27,300 peptides and over 5,100 proteins were identified and quantified in the DIA measurements.
> > 1. In total, over 28,041 peptides and over 5,056 proteins were identified and quantified in the DIA measurements.
> > 2. No, the summary mainly provides an overview of the identifications in each individual DIA measurement as well as some descriptive statistics such as CVs and correlations.
> >
> {: .solution}
Expand All @@ -295,7 +295,7 @@ The dataset in this tutorial consists of two different Spike-in mixtures of huma
> 2. Can you guess which Spike-in contains higher amounts of Ecoli peptides?
>
> > <solution-title></solution-title>
> > 1. Over 800 Ecoli proteins were identified and quantified in the six DIA measurements.
> > 1. Over 817 Ecoli proteins were identified and quantified in the six DIA measurements.
> > 2. It seems that the samples in Spike_in_2 contained higher amounts of Ecoli peptides than the samples in Spike_in_1.
> {: .solution }
{: .question}
Expand Down