A repository of scripts that convert MaxQuant output tables to simplify downstream analysis.
Sample and Data Relationship Format (SDRF-Proteomics) is a text format developed to describe relationships between samples and data files produced in proteomics experiments. It provides metadata information essential to interpret and reanalyse the deposited dataset. You can find more information by following this link.
This repository has two scripts - one written in Python, another in R - with the same functionality: receive the SDRF file, peptides.txt, and proteinGroups.txt and create expression matrices and a sample annotation table. Each of them takes 5 arguments (position-dependent):
- Path to SDRF file
- Path to peptides.txt file
- Path to proteinGroups.txt file
- Output folder path
- 'Protein IDs' or 'Gene names' to choose a column to keep in the proteinGroups expression matrix (can also select by inputting 1 or 2, respectively)
python ./create_tables.py <path/to/sdrf.tsv> <path/to/peptides.txt> <path/to/proteinGroups.txt> <path/to/output/folder> 1
Rscript ./create_tables.R <path/to/sdrf.tsv> <path/to/peptides.txt> <path/to/proteinGroups.txt> <path/to/output/folder> 1
By default, the script will keep in the sample annotation table only those columns from the SDRF file that don't have the same value in each row. This behaviour is controlled by SKIP_REDUNDANT_COLUMNS constant.
Learn more about how to create SDRF files in MaxQuant and how to use them in Perseus for output annotation in the manual or our video tutorial.
Viegener, W., Urazbakhtin, S., Ferretti, D. et al. Facilitating analysis and dissemination of proteomics data through metadata integration in MaxQuant. Nat Commun 16, 8421 (2025). https://doi.org/10.1038/s41467-025-64089-4
This code is licensed under the terms of the CC BY-NC-ND 4.0 license.