Skip to content

Releases: instadeepai/InstaNovo

InstaNovo v1.2.2

05 Dec 14:20

Choose a tag to compare

What's Changed

  • Use correct version for gh-action-pypi-publish action by @BioGeek in #127

Full Changelog: 1.2.1...1.2.2

InstaNovo v1.2.1

05 Dec 14:11
789a945

Choose a tag to compare

What's Changed

  • Update to v1.2.1 and fix CI by @BioGeek in #125
  • Docs job in CI should ignore upstream skipped jobs by @BioGeek in #126

Full Changelog: 1.2.0...1.2.1

InstaNovo v1.2.0

05 Dec 10:47
66f837d

Choose a tag to compare

  • Added multi-gpu support for training and prediction
  • Switched from Lightning to Accelerate
  • Significant improvements (over 10x on GPU) to beam search and knapsack beam search
  • Minor improvements with the v1.2.0 transformer model: instanovo-v1.2.0.ckpt
  • Added support for negative masses in knapsack
  • Various other fixes and improvements

InstaNovo v1.1.4

13 Jun 07:49
665339f

Choose a tag to compare

What's Changed

  • build(deps): bump astral-sh/setup-uv from 5 to 6 by @dependabot in #100
  • fix: diffusion sampling and checkpoint download by @rcatzel in #111
  • chore: bump version number to v.1.1.4 by @rcatzel in #112

Full Changelog: 1.1.3...1.1.4

InstaNovo v1.1.3

10 Jun 13:18

Choose a tag to compare

What's Changed

Notes

  • Update codebase to use new diffusion checkpoint, instanovoplus-v1.1.0
  • Update diffusion predict script with refinement configuration options, and multiple prediction sampling for improved performance
    • refine_all: If True, all predictions will be refined
    • refine_threshold: Only predictions with a confidence score less than this will be refined
    • n_preds: Number of diffusion predictions to sample per spectrum
  • Include updated model performance benchmarking in README.md
  • Added option to specify valid_path as a DictConfig, where keys represent the validation group and values represent the validation path.
    • Eg. "acpt": "/path/to/acpt/*-valid-*.parquet"
    • This will add an "acpt" group in the validation metrics
  • Added add_source_file_column option to SpectrumDataFrame, which adds the path to the original input file as a column "source_file"
  • Added add_spectrum_id option to SpectrumDataFrame, which adds unique index values to the input file as a column "spectrum_id"
  • Updated s3 utils to use a class instead of individual functions
    • Added functionality to wrap write methods and optionally specify an s3 path which will get automatically uploaded
    • Added functionality to automatically download files to a temporary directory when converting s3 paths
  • Allow --output-path to remain unspecified in evaluation mode for:
    • Transformer model
    • Diffusion model when not in refinement mode

New Checkpoints

  • Trained a new InstaNovo+ v1.1.0 checkpoint on the Extended Massive-KB dataset - instanovoplus-v1.1.0.ckpt. To be used standalone, or in conjunction with instanovo-v1.1.0.ckpt for best results.

Full Changelog: 1.1.2...1.1.3

InstaNovo v1.1.2: InstaNovo-P v1.0.0 checkpoint

14 May 13:11
987b049

Choose a tag to compare

What's Changed

This release includes the checkpoint for our de novo sequencing model for phosphoproteomics InstaNovo-P v1.0.0!

New Contributors

Full Changelog: 1.1.1...1.1.2

InstaNovo v1.1.1

31 Mar 15:35
90e0adf

Choose a tag to compare

What's Changed

Code updates for publication release

Full Changelog: 1.1.0...1.1.1

InstaNovo v1.1.0

28 Mar 12:09
8333ea3

Choose a tag to compare

What's Changed

  • Updated to use Typer @BioGeek
    • Typer includes options to predict with InstaNovo and refine with InstaNovo+ in one command
  • Updated dependency management to use UV @BioGeek
  • Re-added updated diffusion code @rcatzel
    • A preliminary checkpoint has been added
  • Adds support for UNIMOD ProForma format in the residue class (backwards compatible with original format)
  • Adds automatic UNIMOD conversion of old model checkpoints during loading by default
  • Adds support for specifying a list of globs for SpectrumDataFrame
  • Removed (+25.98) N-terminal modification from training, and added to suppressed list by default
  • Fixed bug where tempdir would not be initialised if all input data is in .parquet format, causing a crash when performing a preshuffle.
  • Updated predict script to always calculate delta_mass_ppm even in de novo mode.
  • Updated README.md
    • Added usage for UV and Typer
    • Added link to Acknowledgements
    • Added Natively Supported Modifications table with unimod ID
    • Added Output description table
    • Updated Using your own datasets table to remove the modified_sequence column

New checkpoints

  • Trained a new checkpoint with corrected cysteine modification and added MassiveKB data: instanovo-v1.1.0
  • Preliminary InstaNovo+ checkpoint trained on the same data: instanovoplus-v1.1.0-alpha

Notes:

  • A bug with Implicit Cysteine modification has been corrected in the AC-PT. This bug affects the previous instanovo_extended.ckpt checkpoint from the 1.0.0 release. This bug caused significant performance issues on downstream datasets.
  • The diffusion checkpoint is an alpha release with an updated checkpoint coming soon.

InstaNovo v1.0.1

21 Jan 14:23
9b3ef20

Choose a tag to compare

What's Changed

  • build(deps): bump pypa/gh-action-pypi-publish from 1.4.2 to 1.10.3 by @dependabot in #55

  • build(deps): bump actions/setup-python from 4 to 5 by @dependabot in #42

  • Update starter notebook, add charge check, fix sdpa by @KevinEloff in #61

  • build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #64

  • fix: ZeroDivisionError when predicting on small sample file by @BioGeek in #68

  • V1.0.1 release by @BioGeek in #69

  • fix: Resolve "AttributeError: 'SpectrumDataFrame' object has no attribute 'df'"

  • feat: update notebooks to v1.0.0

  • feat: Automatic model download and improve residues
    Co-Authored-By: Kevin Eloff k.eloff@instadeep.com

  • feat: update tests for v1.0.1 release
    Co-Authored-By: Rachel Catzel r.catzel@instadeep.com

  • feat: update packages

New Contributors

Full Changelog: 1.0.0...1.0.1

InstaNovo 1.0.0

09 Oct 16:42
971c367

Choose a tag to compare

Improved code utility and data validation

  • Check labels match precursor
  • Check for data leakage
  • Verify residue vocabulary
  • Added better residue support
    • Fine-tuning trainer automatically updates model weights with new sizes
  • Added Flash attention, torch.compile(), AMP (fp16)
  • Added improved fast greedy search
  • Improved test coverage

Added Spectrum Data Handler

  • Supports lazy loading with asynchronous prefetching
  • Filtering and sampling performed non-destructively (by updating the row filter)
  • Two-fold shuffling strategy for training ensures optimal load times

Extended model checkpoint released. Trained on 32M spectra with additional PTMs:

  • AC-PT
  • Additional PRIDE dataset
  • Additional phosphorylation dataset