SpectoPrep

https://codecov.io/gh/habeeb3579/Spectoprep/graph/badge.svg?token=5EPSYE77K7

Spectroscopy preprocessing using Bayesian Optimization

Overview

SpectoPrep provides a toolkit for optimizing spectroscopic data preprocessing pipelines using Bayesian optimization. It automatically discovers the optimal combination of preprocessing techniques and their parameters to improve model performance for spectroscopic data analysis.

Features

Pipeline Optimization: Automate the discovery of optimal preprocessing pipelines using Bayesian optimization
Flexible Preprocessing: Choose from multiple preprocessing techniques (MSC, SNV, Savitzky-Golay, etc.)
Cross-Validation Support: Group-based cross-validation methods for robust evaluation
Configurable Pipeline Length: Control maximum preprocessing steps and allowed combinations

Installation

pip install spectoprep

Quick Start

from spectoprep.pipeline.optimizer import PipelineOptimizer
import numpy as np

# Prepare your data
X_train = np.array(...)  # Your spectral data matrix
y_train = np.array(...)  # Your target values
groups = np.array(...)   # Optional group labels for cross-validation

# Initialize the optimizer
optimizer = PipelineOptimizer(
    X_train=X_train,
    y_train=y_train,
    X_test=None,
    y_test=None,
    preprocessing_steps=['msc', 'savgol', 'detrend', 'scaler', 'snv',
                          'robust_scaler', 'emsc', 'meancn'],
    cv_method="group_shuffle_split",
    n_splits=3,
    random_state=21,
    groups=groups,
    max_pipeline_length=2,
    allowed_preprocess_combinations=[1, 2]
)

# Run Bayesian optimization to find the best pipeline
best_params, best_pipeline = optimizer.bayesian_optimize(
    init_points=50,
    n_iter=1000
)

# Extract preprocessing steps without the final model
custom_preprocessing = []
for name, step in best_pipeline.steps[:-1]:
    custom_preprocessing.append((name, step))

# Print optimization summary
summary = optimizer.summarize_optimization()
print(f"Best pipeline configuration: {summary['best_pipeline']}")
print(f"Best RMSE: {summary['best_rmse']:.4f}")

# Make predictions with the optimized pipeline
predictions, rmse, r2 = optimizer.get_best_pipeline_predictions(best_pipeline)

Available Preprocessing Methods

msc: Multiplicative Scatter Correction
savgol: Savitzky-Golay filtering
detrend: Linear detrending
scaler: Standard scaling
snv: Standard Normal Variate
robust_scaler: Robust scaling
emsc: Extended Multiplicative Signal Correction
meancn: Mean centering
pca: Principal Component Analysis
select_k_best: Feature selection

Documentation

For detailed documentation, visit spectoprep.readthedocs.io.

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Citation

If you use this package in your research, please cite our paper:

@article{babatunde2025automated,
  title={Automated Spectral Preprocessing via Bayesian Optimization for Chemometric Analysis of Milk Constituents},
  author={Babatunde, Habeeb Abolaji and McDougal, Owen M and Andersen, Timothy},
  journal={Foods},
  volume={14},
  number={17},
  pages={2996},
  year={2025},
  publisher={MDPI}
}

Warning

This package is still under heavy development.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
conda-recipe		conda-recipe
docs		docs
notebook		notebook
src/spectoprep		src/spectoprep
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CHANGELOG.md		CHANGELOG.md
CHANGELOG.rst		CHANGELOG.rst
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
issues-managmement-workflow-process.md		issues-managmement-workflow-process.md
package-workflow-process.md		package-workflow-process.md
pyproject.toml		pyproject.toml
requirements_dev.txt		requirements_dev.txt
roughppdyml		roughppdyml
ruff.toml		ruff.toml
tox.ini		tox.ini
updated_package_deployment.md		updated_package_deployment.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpectoPrep

Overview

Features

Installation

Quick Start

Available Preprocessing Methods

Documentation

Contributing

License

Credits

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

habeeb3579/Spectoprep

Folders and files

Latest commit

History

Repository files navigation

SpectoPrep

Overview

Features

Installation

Quick Start

Available Preprocessing Methods

Documentation

Contributing

License

Credits

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages