Skip to content

Releases: smortezah/napr

v0.2.0

03 Jun 10:53
5b00ac3

Choose a tag to compare

napr v0.2.0 — Major Release

This is a major release representing a comprehensive redesign and expansion of the napr library. The package has been restructured, new domains added, tooling modernized, and the CI/CD pipeline significantly improved.


Breaking Changes

  • Package restructured to src layout. All source code is now under napr. Import paths have changed accordingly.
  • Old napr.apps, napr.data, napr.plotting, and napr.utils modules removed. Functionality has been reorganized into new submodules (see below).
  • Migrated from Poetry to uv for dependency and build management. The pyproject.toml build backend is now uv_build.

New Features

napr.coconut — COCONUT Database Integration

A new top-level subpackage dedicated to the COCONUT natural products database.

  • load_terpene(download, path, version) — Load or download the COCONUT terpene dataset (v21.3). Handles compressed .bz2 files and gracefully resolves missing Content-Disposition headers during download.
  • Terpene class — High-level interface for working with terpene data, providing:
    • .preprocess(**kwargs) — Full preprocessing pipeline (imputation, encoding, scaling, feature splitting).
    • .dim_reduce(inplace, **kwargs) — PCA-based dimensionality reduction.
    • .plot — Integrated plotting submodule (Plot) for exploratory visualization.
  • eval_classification(…) — Evaluate multiple sklearn-compatible classifiers with accuracy, precision, recall, and F1 metrics using train/test splits and sample weighting.
  • find_best_models(X, y, hypermodel, …) — Keras Tuner–powered hyperparameter search with stratified k-fold cross-validation.

napr.coconut.plotting

  • Custom matplotlib styles: ggplot_classic and ggplot_bw.
  • set_plt_style() and label_subplot() utilities for consistent figure styling.

napr.coconut.terpene.explore

  • Plot.dist_subclass_mw_logp_nplscore() and related methods for EDA of terpene chemical subclasses (MW, logP, NPL score distributions).

Core utilities

  • napr.decorators.info — Decorator that prints start/finish messages and wall-clock runtime for long-running methods.
  • napr.stats.percent_within — Compute the percentage of values falling within a numeric interval.
  • napr.random.rand_list_string — Generate lists of random strings from a configurable alphabet.
  • napr.utils.split_train_test — Stratified train/test splitting with optional class filtering and label encoding.

Improvements

  • Python version support narrowed to >=3.10, <3.14 for better compatibility guarantees.
  • Dependencies pinned to modern versions: NumPy ≥ 2.2, Pandas ≥ 2.3, scikit-learn ≥ 1.7, XGBoost ≥ 3.2, TensorFlow ≥ 2.21.
  • CI rewritten to use uv for faster installs and reproducible environments.
  • PyPI publish workflow added (release.yml) — releases are now automatically published on tag push.
  • Codecov integration with token-based upload.
  • Tutorials updated — Two Jupyter notebooks cover terpene EDA and classification with kNN, random forest, and XGBoost.

Bug Fixes

  • Fixed missing filename fallback when the Content-Disposition header is absent during file download.
  • Streamlined file existence check in load_terpene to avoid redundant I/O.

Developer Experience

  • pytest-cov added to dev dependencies for coverage reporting.
  • Test suite reorganized to mirror the napr package structure under tests.
  • ruff configured with line-length = 100.