Releases: smortezah/napr
Releases · smortezah/napr
v0.2.0
napr v0.2.0 — Major Release
This is a major release representing a comprehensive redesign and expansion of the napr library. The package has been restructured, new domains added, tooling modernized, and the CI/CD pipeline significantly improved.
Breaking Changes
- Package restructured to src layout. All source code is now under napr. Import paths have changed accordingly.
- Old
napr.apps,napr.data,napr.plotting, andnapr.utilsmodules removed. Functionality has been reorganized into new submodules (see below). - Migrated from Poetry to
uvfor dependency and build management. The pyproject.toml build backend is nowuv_build.
New Features
napr.coconut — COCONUT Database Integration
A new top-level subpackage dedicated to the COCONUT natural products database.
load_terpene(download, path, version)— Load or download the COCONUT terpene dataset (v21.3). Handles compressed.bz2files and gracefully resolves missingContent-Dispositionheaders during download.Terpeneclass — High-level interface for working with terpene data, providing:.preprocess(**kwargs)— Full preprocessing pipeline (imputation, encoding, scaling, feature splitting)..dim_reduce(inplace, **kwargs)— PCA-based dimensionality reduction..plot— Integrated plotting submodule (Plot) for exploratory visualization.
eval_classification(…)— Evaluate multiple sklearn-compatible classifiers with accuracy, precision, recall, and F1 metrics using train/test splits and sample weighting.find_best_models(X, y, hypermodel, …)— Keras Tuner–powered hyperparameter search with stratified k-fold cross-validation.
napr.coconut.plotting
- Custom matplotlib styles:
ggplot_classicandggplot_bw. set_plt_style()andlabel_subplot()utilities for consistent figure styling.
napr.coconut.terpene.explore
Plot.dist_subclass_mw_logp_nplscore()and related methods for EDA of terpene chemical subclasses (MW, logP, NPL score distributions).
Core utilities
napr.decorators.info— Decorator that prints start/finish messages and wall-clock runtime for long-running methods.napr.stats.percent_within— Compute the percentage of values falling within a numeric interval.napr.random.rand_list_string— Generate lists of random strings from a configurable alphabet.napr.utils.split_train_test— Stratified train/test splitting with optional class filtering and label encoding.
Improvements
- Python version support narrowed to
>=3.10, <3.14for better compatibility guarantees. - Dependencies pinned to modern versions: NumPy ≥ 2.2, Pandas ≥ 2.3, scikit-learn ≥ 1.7, XGBoost ≥ 3.2, TensorFlow ≥ 2.21.
- CI rewritten to use
uvfor faster installs and reproducible environments. - PyPI publish workflow added (release.yml) — releases are now automatically published on tag push.
- Codecov integration with token-based upload.
- Tutorials updated — Two Jupyter notebooks cover terpene EDA and classification with kNN, random forest, and XGBoost.
Bug Fixes
- Fixed missing
filenamefallback when theContent-Dispositionheader is absent during file download. - Streamlined file existence check in
load_terpeneto avoid redundant I/O.
Developer Experience
pytest-covadded to dev dependencies for coverage reporting.- Test suite reorganized to mirror the napr package structure under tests.
ruffconfigured withline-length = 100.