Releases: instadeepai/InstaNovo
InstaNovo v1.2.2
What's Changed
Full Changelog: 1.2.1...1.2.2
InstaNovo v1.2.1
What's Changed
- Update to v1.2.1 and fix CI by @BioGeek in #125
- Docs job in CI should ignore upstream skipped jobs by @BioGeek in #126
Full Changelog: 1.2.0...1.2.1
InstaNovo v1.2.0
- Added multi-gpu support for training and prediction
- Switched from Lightning to Accelerate
- Significant improvements (over 10x on GPU) to beam search and knapsack beam search
- Minor improvements with the v1.2.0 transformer model:
instanovo-v1.2.0.ckpt - Added support for negative masses in knapsack
- Various other fixes and improvements
InstaNovo v1.1.4
What's Changed
- build(deps): bump astral-sh/setup-uv from 5 to 6 by @dependabot in #100
- fix: diffusion sampling and checkpoint download by @rcatzel in #111
- chore: bump version number to v.1.1.4 by @rcatzel in #112
Full Changelog: 1.1.3...1.1.4
InstaNovo v1.1.3
What's Changed
- chore: bump version number to v1.1.2 by @BioGeek in #104
- docs: add InstaNovo-P notebook by @BioGeek in #105
- InstaNovo v1.1.3 by @rcatzel in #109
Notes
- Update codebase to use new diffusion checkpoint,
instanovoplus-v1.1.0 - Update diffusion predict script with refinement configuration options, and multiple prediction sampling for improved performance
refine_all: If True, all predictions will be refinedrefine_threshold: Only predictions with a confidence score less than this will be refinedn_preds: Number of diffusion predictions to sample per spectrum
- Include updated model performance benchmarking in README.md
- Added option to specify
valid_pathas a DictConfig, where keys represent the validation group and values represent the validation path.- Eg.
"acpt": "/path/to/acpt/*-valid-*.parquet" - This will add an "acpt" group in the validation metrics
- Eg.
- Added
add_source_file_columnoption to SpectrumDataFrame, which adds the path to the original input file as a column"source_file" - Added
add_spectrum_idoption to SpectrumDataFrame, which adds unique index values to the input file as a column"spectrum_id" - Updated s3 utils to use a class instead of individual functions
- Added functionality to wrap write methods and optionally specify an s3 path which will get automatically uploaded
- Added functionality to automatically download files to a temporary directory when converting s3 paths
- Allow
--output-pathto remain unspecified in evaluation mode for:- Transformer model
- Diffusion model when not in refinement mode
New Checkpoints
- Trained a new InstaNovo+ v1.1.0 checkpoint on the Extended Massive-KB dataset -
instanovoplus-v1.1.0.ckpt.To be used standalone, or in conjunction withinstanovo-v1.1.0.ckptfor best results.
Full Changelog: 1.1.2...1.1.3
InstaNovo v1.1.2: InstaNovo-P v1.0.0 checkpoint
What's Changed
This release includes the checkpoint for our de novo sequencing model for phosphoproteomics InstaNovo-P v1.0.0!
- docs: re-enable building docs by @BioGeek in #85
- Fix docs by @BioGeek in #88
- Auto detect device by @rcatzel in #91
- Notebook comparing InstaNovo v0.1 versus InstaNovo v1.1 by @BioGeek in #95
- docs: add coverage badge by @BioGeek in #89
- instanovo-phospho-v1.0.0 by @jesperdlau in #103
New Contributors
- @jesperdlau made their first contribution in #103
Full Changelog: 1.1.1...1.1.2
InstaNovo v1.1.1
What's Changed
Code updates for publication release
- fix: add sample data by @BioGeek in #81
- docs: update readme for publication by @KevinEloff @BioGeek @rcatzel in #82
- fix: add diffusion device option, set automatically on load by @KevinEloff in #83
- fix: use separate diffusion sdf instance, minor release bump by @rcatzel in #84
Full Changelog: 1.1.0...1.1.1
InstaNovo v1.1.0
What's Changed
- Updated to use Typer @BioGeek
- Typer includes options to predict with InstaNovo and refine with InstaNovo+ in one command
- Updated dependency management to use UV @BioGeek
- Re-added updated diffusion code @rcatzel
- A preliminary checkpoint has been added
- Adds support for UNIMOD ProForma format in the residue class (backwards compatible with original format)
- Adds automatic UNIMOD conversion of old model checkpoints during loading by default
- Adds support for specifying a list of globs for SpectrumDataFrame
- Removed (+25.98) N-terminal modification from training, and added to suppressed list by default
- Fixed bug where tempdir would not be initialised if all input data is in
.parquetformat, causing a crash when performing a preshuffle. - Updated predict script to always calculate
delta_mass_ppmeven in de novo mode. - Updated README.md
- Added usage for UV and Typer
- Added link to Acknowledgements
- Added Natively Supported Modifications table with unimod ID
- Added Output description table
- Updated Using your own datasets table to remove the
modified_sequencecolumn
New checkpoints
- Trained a new checkpoint with corrected cysteine modification and added MassiveKB data: instanovo-v1.1.0
- Preliminary InstaNovo+ checkpoint trained on the same data: instanovoplus-v1.1.0-alpha
Notes:
- A bug with Implicit Cysteine modification has been corrected in the AC-PT. This bug affects the previous
instanovo_extended.ckptcheckpoint from the 1.0.0 release. This bug caused significant performance issues on downstream datasets. - The diffusion checkpoint is an alpha release with an updated checkpoint coming soon.
InstaNovo v1.0.1
What's Changed
-
build(deps): bump pypa/gh-action-pypi-publish from 1.4.2 to 1.10.3 by @dependabot in #55
-
build(deps): bump actions/setup-python from 4 to 5 by @dependabot in #42
-
Update starter notebook, add charge check, fix sdpa by @KevinEloff in #61
-
build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #64
-
fix: ZeroDivisionError when predicting on small sample file by @BioGeek in #68
-
fix: Resolve "AttributeError: 'SpectrumDataFrame' object has no attribute 'df'"
-
feat: update notebooks to v1.0.0
-
feat: Automatic model download and improve residues
Co-Authored-By: Kevin Eloff k.eloff@instadeep.com -
feat: update tests for v1.0.1 release
Co-Authored-By: Rachel Catzel r.catzel@instadeep.com -
feat: update packages
New Contributors
- @dependabot made their first contribution in #55
Full Changelog: 1.0.0...1.0.1
InstaNovo 1.0.0
Improved code utility and data validation
- Check labels match precursor
- Check for data leakage
- Verify residue vocabulary
- Added better residue support
- Fine-tuning trainer automatically updates model weights with new sizes
- Added Flash attention, torch.compile(), AMP (fp16)
- Added improved fast greedy search
- Improved test coverage
Added Spectrum Data Handler
- Supports lazy loading with asynchronous prefetching
- Filtering and sampling performed non-destructively (by updating the row filter)
- Two-fold shuffling strategy for training ensures optimal load times
Extended model checkpoint released. Trained on 32M spectra with additional PTMs:
- AC-PT
- Additional PRIDE dataset
- Additional phosphorylation dataset