another small change

ypriverol · ypriverol · commit 285a80842fb6 · 2025-10-11T10:53:07.000+08:00
diff --git a/ms2rescore-quantms-mcp/main.tex b/ms2rescore-quantms-mcp/main.tex
@@ -57,12 +57,11 @@
 % Main Content
 \section{Introduction}
 
-In recent years, the field of proteomics has experienced rapid growth in the availability of publicly accessible datasets, accompanied by a shift toward studies analyzing larger sample cohorts. As of June 2025, over 40,000 datasets have been submitted to ProteomeXchange (PX) repositories, including a substantial increase in large-scale submissions comprising more than 100 instrument files \cite{perez-riverol_pride_2025}. Recently, we introduced quantms, an open-source, cloud-based pipeline designed for massively parallel reanalysis of quantitative proteomics datasets \cite{dai_quantms_2024} to address computational bottlenecks for large-scale quantitative re-analyses. By 2025, quantms supports three major search engines, SAGE \cite{lazear2023sage}, MSGF+ \cite{Kim2014-sr}, and Comet \cite{Eng2013-ds}, and the combination of them; peptide identifications can be boosted by using Percolator \cite{kall2007semi}. Different to other proteomics analysis platforms like MaxQuant \cite{cox_maxquant_2008}, PeptideShaker \cite{vaudel2015peptideshaker}, or pFind \cite{wang_pfind_2007}, quantms is highly modular and flexible, accommodating a wide range of quantitative proteomics approaches. quantms automatically distributes computations using the Nextflow workflow engine across one or more computing nodes, depending on the number of instrument files and samples \cite{di_tommaso_nextflow_2017}. To ensure traceability and reproducibility, the pipeline is built entirely on standardized open file formats \cite{dai_proteomics_2021, martens_mzmlcommunity_2011} and reproducible execution environments such as Docker and Singularity, adhering strictly to the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles \cite{wilkinson_fair_2016}. 
+In recent years, the field of proteomics has experienced rapid growth in the availability of publicly accessible datasets, accompanied by a shift toward studies analyzing larger sample cohorts. As of June 2025, over 40,000 datasets have been submitted to ProteomeXchange (PX) repositories, including a substantial increase in large-scale submissions comprising more than 100 instrument files \cite{perez-riverol_pride_2025}. Recently, we introduced quantms, an open-source, cloud-based pipeline designed for massively parallel reanalysis of quantitative proteomics datasets \cite{dai_quantms_2024} to address computational bottlenecks for large-scale quantitative re-analyses. By 2025, quantms supports three major search engines, SAGE \cite{lazear2023sage}, MSGF+ \cite{Kim2014-sr}, and Comet \cite{Eng2013-ds}, and the combination of them; peptide identifications can be boosted by using Percolator \cite{kall2007semi}. Different to other proteomics analysis platforms like MaxQuant \cite{cox_maxquant_2008}, PeptideShaker \cite{vaudel2015peptideshaker}, or pFind \cite{wang_pfind_2007}, quantms is highly modular and flexible, accommodating a wide range of quantitative proteomics approaches. quantms automatically distributes computations using the Nextflow workflow engine across one or more computing nodes, depending on the number of instrument files and samples \cite{di_tommaso_nextflow_2017}. To ensure traceability and reproducibility, the pipeline is built entirely on standardized open file formats \cite{dai_proteomics_2021, martens_mzmlcommunity_2011, griss2014mztab} and reproducible execution environments using BioContainers Docker and Singularity containers \cite{da2017biocontainers}, adhering strictly to the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles \cite{wilkinson_fair_2016}. 
 
 Deep learning algorithms that enables to predict peptides features such as retention time; ion mobility; and fragment ion intensities, have transformed the peptide identification landscape, boosting the identification power of existing peptide search engines \cite{buur_ms2_2024, yang2023msbooster}. Various models have been developed to accurately predict peptide behavior in LC-MS/MS, such as MS²PIP \cite{declercq_updated_2023}, AlphaPeptDeep \cite{zeng_alphapeptdeep_2022} and DeepLC \cite{bouwmeester_deeplc_2021} for fragment ion intensities and retention time prediction, respectively. These highly accurate predictions enable superior matching of experimental data to theoretical expectations and have reinvigorated rescoring strategies in proteomics. In addition to the deep learning algorithms, a new set of tools and packages has been developed to coordinate, and facilitate the integration of these algorithms with existing search engines and frameworks. MS²Rescore, for exmaple, is a modular Python package that leverage MS²PIP and DeepLC to generates multiple features assessing the similarity between observed and predicted peptide behavior \cite{declercq_tims2_2025}.
 
-Here, we introduce quantms-rescoring; an open-source package than enables simmisly integration of quantms supported search engines such as SAGE, MSGF+; and Comet and multiple popular deep learning algorithms such as MS²PIP, DeepLC, and AlphaPeptDeep. Different from MS²Rescore, which incorporates the rescoring engine within the package using Percolator or mokapot \cite{fondrie2021mokapot}, quantms-rescoring leave that responsability to quantms workflow and focus the efforts on the feature prediction and model fitting to the specific dataset. In addition, quantms-rescoring package provides a module to compute multiple signal-to-noise spectra features that can be added as features to the rescoring tool in quantms (Percolator). We benchmarked the new implentation of quantms v1.6.0 using quantms-rescoring for multiple combinations of search engines and deep learning algorithms with different datasets and experimental designs (e.g different instruments, enzymes, etc) including enhanced pipeline supports in-depth analysis of large-scale public proteomics datasets across diverse experimental designs, including label-free quantification (PXD001819, PXD019643, PXD026824), tandem mass tag-based quantification (PDC000127), immunopeptidomics (PXD019643), and phosphoproteomics (PXD026824) studies. 
-@Yasset Comment: @TODO Dai, I think here we need to write some concluding sentences in terms of % of PSMs increased and % of proteins quantified. 
+Here, we introduce quantms-rescoring; an open-source package than enables simmisly integration of quantms supported search engines such as SAGE, MSGF+; and Comet and multiple popular deep learning algorithms such as MS²PIP, DeepLC, and AlphaPeptDeep. Different from MS²Rescore, which incorporates the rescoring engine within the package using Percolator or mokapot \cite{fondrie2021mokapot}, quantms-rescoring leave that responsability to quantms workflow and focus the efforts on the feature prediction and model fitting to the specific dataset. In addition, quantms-rescoring package provides a module to compute multiple signal-to-noise spectra features that can be added as features to the rescoring tool in quantms (Percolator). It includes a novel approach to select the best algorithm, model, and parameters that fit the data under study. We benchmarked the new implentation of quantms v1.6.0 using quantms-rescoring for multiple combinations of search engines and deep learning algorithms with different datasets and experimental designs (e.g different instruments, enzymes, etc) including enhanced pipeline supports in-depth analysis of large-scale public proteomics datasets across diverse experimental designs, including label-free quantification (PXD001819, PXD019643, PXD026824), tandem mass tag-based quantification (PDC000127), immunopeptidomics (PXD019643), and phosphoproteomics (PXD026824) studies. In average, quantms-rescoring achieved a 16–22.8\% increase in identified spectra, along with the quantification of X\% additional phosphorylation peptides and X\% phosphosites for the phospho datasets.
 
 \section{Methods}