Search- and quantification-engine agnostic downstream processing of proteomics data
Functionalities are intended to be as close to pure python as possible, avoiding closed end-to-end implementations, which is reflected in several design choices:
- AnnData is used in favor of a custom data class to enable interoperability with any other tool from the Scverse.
- matplotlib Axes and Figure instances are used for visualization, giving the user full autonomy to layer on custom visualizations with searborn, matplotlib, or any other compatible visualization package.
- Statistical and preprocessing functions are standalone and set with strong defaults, meaning that any function can be used outside of the
alphapepttoolscontext.
- Data handling:
AnnDatawas chosen as a data container for two main reasons:- For presenting a lightweight, powerful solution to a fundamental challenge with dataframes, which is keeping numerical data and metadata aligned together at all times. Using dataframes, the options are to either include non-numeric metadata columns in the dataframe (complicating data operations) or to add cumbersome multi-level indices and
- For their compatibility with the Scverse, Scanpy and all associated tools, essentially removing the barrier between proteomics and transcriptomics data analysis and enabling multi-omics analyses.
- Plotting: Inspired by the
styliapackage,alphapepttoolsaims to provide a consistent and aesthetically pleasing visual experience for all plots. A core component of this implementation is the fact thatcreate_figurereturns subplots as an iterable data structure, meaning that once the basic layout of a plot is decided, users simply jump from one plot window to the next and populate each one with figure elements. - Standardization: A key consideration of this package is the loading of proteomics data, the biggest painpoint of which is the nonstandard output of various proteomic search engines. By building on
alphabase, we handle this complexity early and provide the user with AnnData objects containing either proteins or precursors, where the familiar Pandas DataFrame is always just a 'df = adata.to_df().join(adata.obs)' away.
Please refer to the documentation, in particular, the API documentation.
You need to have Python 3.10 or newer installed on your system. If you don't have Python installed, we recommend installing Mambaforge.
There are several alternative options to install alphapepttools:
- Install the latest release of
alphapepttoolsfrom PyPI:
pip install alphapepttoolsAs the package is still under development, consider installing the latest development version:
- Development version
git clone git+https://github.com/MannLabs/alphapepttools.git@main && cd alphapepttools
pip install -e .or with more dependencies:
pip install -e ".[test, dev]"See the GitHub Release page.
This document gathers information on how to develop and contribute to the alphaDIA project.
In order to have release notes automatically generated, changes need to be tagged with labels.
The following labels are used (should be safe-explanatory):
breaking-change, bug, enhancement.
This package uses a shared release process defined in the alphashared repository. Please see the instructions there
For questions and help requests, you can reach out in the scverse discourse. If you found a bug, please use the issue tracker.
t.b.a