Skip to content

Commit 9b7c78a

Browse files
committedDec 18, 2023
Merge branch 'priorityassignment' into 'master'
Resolve "PriorityAssignment contains broken code" Closes #116 See merge request mass-spectrometry/corems!66
2 parents c8aa027 + 67417fd commit 9b7c78a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+9943
-3049
lines changed
 

‎.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ test_env/
33
venv310/
44
venvwsl/
55
tmp_data/
6+
tmp_docs/
67
.pyc
78
.pyo
89
*.hdf5

‎README.md

Lines changed: 58 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
![CoreMS Logo](docs/CoreMS.COLOR.png)
1+
![CoreMS Logo](docs/CoreMS.COLOR_small.png)
22

3-
<div align="center">
3+
<div align="left">
44

55
<br>
66
<br>
@@ -25,9 +25,9 @@
2525
- [Installation](#corems-installation)
2626
- [Thermo Raw File on Mac and Linux](#thermo-raw-file-access)
2727
- Execution:
28-
- [Jupyter Notebook and Docker containers](#molecular-database-and-jupyter-notebook-containers)
28+
- [Jupyter Notebook and Docker containers](#docker-stack)
2929
- [Simple Example](#simple-script-example)
30-
- [Python Examples](examples/examples)
30+
- [Python Examples](examples/scripts)
3131
- [Jupyter Notebook Examples](examples/notebooks)
3232

3333

@@ -41,21 +41,21 @@
4141

4242
Data handling and software development for modern mass spectrometry (MS) is an interdisciplinary endeavor requiring skills in computational science and a deep understanding of MS. To enable scientific software development to keep pace with fast improvements in MS technology, we have developed a Python software framework named CoreMS. The goal of the framework is to provide a fundamental, high-level basis for working with all mass spectrometry data types, allowing custom workflows for data signal processing, annotation, and curation. The data structures were designed with an intuitive, mass spectrometric hierarchical structure, thus allowing organized and easy access to the data and calculations. Moreover, CoreMS supports direct access for almost all vendors’ data formats, allowing for the centralization and automation of all data processing workflows from the raw signal to data annotation and curation.
4343

44-
- reproducible pipeline
44+
CoreMS aims to provide
4545
- logical mass spectrometric data structure
4646
- self-containing data and metadata storage
4747
- modern molecular formulae assignment algorithms
4848
- dynamic molecular search space database search and generator
4949

5050
## Current Version
5151

52-
### `2.5.3.beta`
52+
`2.5.3.beta`
5353

5454
## Main Developers/Contact
5555
- [Yuri. E. Corilo](mailto:corilo@pnnl.gov)
5656
- [William Kew](mailto:william.kew@pnnl.gov)
5757

58-
58+
## Data formats
5959
### Data input formats
6060

6161
- Bruker Solarix (CompassXtract)
@@ -68,10 +68,7 @@ Data handling and software development for modern mass spectrometry (MS) is an i
6868
- CoreMS exported processed mass list files(excel, .csv, .txt, pandas dataframe as .pkl)
6969
- CoreMS self-containing Hierarchical Data Format (.hdf5)
7070
- Pandas Dataframe
71-
72-
- Support for Could Storage using s3path.S3path
73-
see examples of usage here:
74-
- [S3 Support](tests/s3_test.py)
71+
- Support for cloud Storage using s3path.S3path(see examples of usage here: [S3 Support](tests/s3_test.py))
7572

7673
### Data output formats
7774

@@ -85,54 +82,33 @@ Data handling and software development for modern mass spectrometry (MS) is an i
8582

8683
- LC-MS
8784
- GC-MS
88-
- IMS-MS (`TODO`)
89-
- LC-IMS-MS (`TODO`)
90-
- Collections (`TODO`)
9185
- Transient
9286
- Mass Spectra
9387
- Mass Spectrum
9488
- Mass Spectral Peak
9589
- Molecular Formula
96-
- Molecular Structure (`TODO`)
90+
91+
### In progress data structures
92+
- IMS-MS
93+
- LC-IMS-MS
94+
- Collections
95+
- Molecular Structure
9796

9897
---
9998
## Available features
10099

101-
### FT-MS Signal Processing
100+
### FT-MS Signal Processing, Calibration, and Molecular Formula Search and Assignment
102101

103102
- Apodization, Zerofilling, and Magnitude mode FT
104103
- Manual and automatic noise threshold calculation
105104
- Peak picking using apex quadratic fitting
106105
- Experimental resolving power calculation
107-
108-
### GC-MS Signal Processing
109-
110-
- Baseline detection, subtraction, smoothing
111-
- m/z based Chromatogram Peak Deconvolution,
112-
- Manual and automatic noise threshold calculation
113-
- First and second derivatives peak picking methods
114-
- Peak Area Calculation
115-
116-
### GC-MS Calibration
117-
118-
- Retention Index Calibration
119-
120-
### GC-MS Compound Identification
121-
122-
- Automatic local (SQLite) or external (MongoDB or PostgreSQL) database check, generation, and search
123-
- Automatic molecular match algorithm with all spectral similarity methods
124-
125-
### FT-MS Calibration
126-
127106
- Frequency and m/z domain calibration functions:
128107
- LedFord equation [ref]
129108
- Linear equation
130109
- Quadratic equation
131110
- Automatic search most abundant **Ox** homologue series
132111
- Step fit ('walking calibration") based on the LedFord equation [ref]
133-
134-
### FT-MS Molecular formulae search and assignment
135-
136112
- Automatic local (SQLite) or external (PostgreSQL) database check, generation, and search
137113
- Automatic molecular formulae assignments algorithm for ESI(-) MS for natural organic matter analysis
138114
- Automatic fine isotopic structure calculation and search for all isotopes
@@ -141,7 +117,18 @@ Data handling and software development for modern mass spectrometry (MS) is an i
141117
- Kendrick classification
142118
- Heteroatoms classification and visualization
143119

144-
### High Resolution Mass spectrum simulations
120+
### GC-MS Signal Processing, Calibration, and Compound Identification
121+
122+
- Baseline detection, subtraction, smoothing
123+
- m/z based Chromatogram Peak Deconvolution,
124+
- Manual and automatic noise threshold calculation
125+
- First and second derivatives peak picking methods
126+
- Peak Area Calculation
127+
- Retention Index Calibration
128+
- Automatic local (SQLite) or external (MongoDB or PostgreSQL) database check, generation, and search
129+
- Automatic molecular match algorithm with all spectral similarity methods
130+
131+
### High Resolution Mass Spectrum Simulations
145132

146133
- Peak shape (Lorentz, Gaussian, Voigt, and pseudo-Voigt)
147134
- Peak fitting for peak shape definition
@@ -150,7 +137,7 @@ Data handling and software development for modern mass spectrometry (MS) is an i
150137
- Calculated ICR Resolving Power based on magnetic field (B), and transient time(T)
151138

152139
---
153-
## CoreMS Installation
140+
## Installation
154141

155142
```bash
156143
pip install corems
@@ -164,16 +151,10 @@ To use Postgresql the easiest way is to build a docker container:
164151
docker-compose up -d
165152
```
166153

167-
- Change the url_database on MSParameters.molecular_search.url_database to:
154+
- Change the url_database on MSParameters.molecular_search.url_database to: "postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp"
155+
- Set the url_database env variable COREMS_DATABASE_URL to: "postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp"
168156

169-
"postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp"
170-
171-
- Set the url_database env variable COREMS_DATABASE_URL to:
172-
173-
"postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp"
174-
175-
---
176-
## Thermo Raw File Access:
157+
### Thermo Raw File Access:
177158

178159
To be able to open thermo file a installation of pythonnet is needed:
179160
- Windows:
@@ -188,11 +169,11 @@ To be able to open thermo file a installation of pythonnet is needed:
188169
```
189170

190171
---
191-
### Another option is to run the docker stack that will start the CoreMS containers:
172+
## Docker stack
192173

193-
---
174+
Another option to use CoreMS is to run the docker stack that will start the CoreMS containers
194175

195-
## Molecular Database and Jupyter Notebook Containers
176+
### Molecular Database and Jupyter Notebook Docker Containers
196177

197178
A docker container containing:
198179
- A custom python distribution will all dependencies installed
@@ -245,7 +226,7 @@ If you don't have docker installed, the easiest way is to [install docker for de
245226
___
246227
## Simple Script Example
247228
248-
More examples can be found under the directory docs/example, docs/notebooks
229+
More examples can be found under the directory examples/scripts, examples/notebooks
249230
250231
- Basic functionality example
251232
@@ -257,75 +238,66 @@ from matplotlib import pyplot
257238
258239
file_path= 'tests/tests_data/ftms/ESI_NEG_SRFA.d'
259240
260-
#Bruker Solarix class reader
241+
# Instatiate the Bruker Solarix reader with the filepath
261242
bruker_reader = ReadBrukerSolarix(file_path)
262243
263-
#access the transient object
244+
# Use the reader to instatiate a transient object
264245
bruker_transient_obj = bruker_reader.get_transient()
265246
266-
#calculates the transient duration time
247+
# Calculate the transient duration time
267248
T = bruker_transient_obj.transient_time
268249
269-
#access the mass spectrum object
250+
# Use the transient object to instatitate a mass spectrum object
270251
mass_spectrum_obj = bruker_transient_obj.get_mass_spectrum(plot_result=False, auto_process=True)
271252
272-
# - search monoisotopic molecular formulas for all mass spectral peaks
273-
# - calculate fine isotopic structure based on monoisotopic molecular formulas found and current dynamic range
274-
# - search molecular formulas of correspondent calculated isotopologues,
253+
# The following SearchMolecularFormulas function does the following
254+
# - searches monoisotopic molecular formulas for all mass spectral peaks
255+
# - calculates fine isotopic structure based on monoisotopic molecular formulas found and current dynamic range
256+
# - searches molecular formulas of correspondent calculated isotopologues
275257
# - settings are stored at SearchConfig.json and can be changed directly on the file or inside the framework class
276258
277259
SearchMolecularFormulas(mass_spectrum_obj, first_hit=False).run_worker_mass_spectrum()
278260
279-
# iterate over mass spectral peaks objs
261+
# Iterate over mass spectral peaks objs within the mass_spectrum_obj
280262
for mspeak in mass_spectrum_obj.sort_by_abundance():
281263
282-
# returns true if there is at least one molecular formula associated
283-
# with the mass spectral peak
284-
# same as mspeak.is_assigned -- > bool
264+
# If there is at least one molecular formula associated, mspeak returns True
285265
if mspeak:
286266
287-
# get the molecular formula with the highest mass accuracy
267+
# Get the molecular formula with the highest mass accuracy
288268
molecular_formula = mspeak.molecular_formula_lowest_error
289269
290-
# plot mz and peak height, use mass_spectrum_obj.mz_exp to access all mz
291-
# and mass_spectrum_obj.mz_exp_profile to access mz with all available datapoints
270+
# Plot mz and peak height
292271
pyplot.plot(mspeak.mz_exp, mspeak.abundance, 'o', c='g')
293272
294-
#or use
295-
mspeak.plot(color="black", derivative=True, deriv_color='red')
296-
# iterate over all molecular formulae associated with the ms peaks obj
273+
# Iterate over all molecular formulas associated with the ms peaks obj
297274
for molecular_formula in mspeak:
298275
299-
#check if the molecular formula is a isotopologue
276+
# Check if the molecular formula is a isotopologue
300277
if molecular_formula.is_isotopologue:
301278
302-
#access the molecular formula text representation
279+
# Access the molecular formula text representation and print
303280
print (molecular_formula.string)
304281
305-
#get 13C atoms count
282+
# Get 13C atoms count
306283
print (molecular_formula['13C'])
307284
else:
308-
#get mz and peak height
285+
# Get mz and peak height
309286
print(mspeak.mz_exp,mspeak.abundance)
310287
311-
312-
#exporting data
288+
# Save data
289+
## to a csv file
313290
mass_spectrum_obj.to_csv("filename")
314-
315291
mass_spectrum_obj.to_hdf("filename")
316-
# save pandas Datarame to pickle
292+
# to pandas Datarame pickle
317293
mass_spectrum_obj.to_pandas("filename")
318-
# get pandas Dataframe
294+
295+
# Extract data as a pandas Dataframe
319296
df = mass_spectrum_obj.to_dataframe()
320297
```
321298
## UML Diagrams
322299
323-
- Direct Infusion FT-MS:
324-
325-
![FT-MS UML Diagram](docs/uml/Direct_Infusion_FTMS_Data_Model.png)
326-
- GC-MS:
327-
328-
![GC-MS UML Diagram](docs/uml/GC_MS_Data_Model.png)
300+
UML (unified modeling language) diagrams for Direct Infusion FT-MS and GC-MS classes can be found [here](docs/uml).
329301
330302
## Citing CoreMS
331303

0 commit comments

Comments
 (0)