Releases: kermitt2/delft
Releases · kermitt2/delft
v0.4.3
v0.4.2
What's Changed
- fix: ensure learning rate schedule handles tensor steps correctly by @lfoppiano in #204
- Make the documentation working again! by @lfoppiano in #205
- Handle empty or missing directory for embeddings by @lfoppiano in #206
Full Changelog: v0.4.1...v0.4.2
v0.4.1
Breaking changes
- TensorFlow 2.17 / tf_keras 2.17: DeLFT now requires TensorFlow 2.17.1 and the standalone
tf_keras2.17.0 package. All Keras imports updated fromtensorflow.kerastotf_keras. Pre-trained model weights from 0.3.x are not compatible and must be retrained. (#180, #181) - Python 3.10+ required: Python 3.8 and 3.9 are no longer supported. (#180)
- CUDA 12.1 required for GPU: On Linux, torch is no longer in the base install to avoid CUDA conflicts. Use
pip install "delft[gpu]"with the PyTorch cu121 index. (#195) - LMDB embedding format changed: Embeddings stored as raw float32 bytes instead of pickle. Enables Java interoperability (GROBID). Convert with:
python -m delft.utilities.convert_lmdb_embeddings --input <old> --output <new>(#180) - ELMo support removed: Use transformer-based or static embeddings instead. (#190, #192)
New features
- Weights & Biases integration for experiment tracking (
--wandbflag) (#172) - Distributed training support via SLURM scripts (#191)
- Configurable
num_workersparameter for data loading (defaults to CPU count - 1) (#183) - BPE preprocessor auto-detection (#184)
- Safetensors format support for local models
- Licenses/copyrights classifier application
- LMDB format validation and conversion utility (#180)
- CI release pipeline with automated PyPI publishing (#196)
- Contributing guidelines (CONTRIBUTING.md) (#200)
- Code reformatting and lint enforcement with ruff (#197)
Bug fixes
- Fixed crash when accessing learning rate during training (#180)
- Fixed multiprocessing issues on macOS (#180)
- Fixed empty embeddings handling with additional validation checks (#180)
- Fixed resource-registry.json now bundled as package data (no more hardcoded path) (#201, #202)
- Fixed word2vec embedding URL (now uses HuggingFace for GloVe instead of Stanford) (#180)
- Fixed compatibility with newer versions of transformers library (#180)
Other changes
- Migrated build config from
setup.pytopyproject.toml(#195) - Retrained all models with TensorFlow 2.17 (#191)
- Updated dependencies (transformers 4.48, torch 2.5.1, numpy 1.26.4, scikit-learn 1.6.1, pandas 2.2.3)
- Improved GitHub Actions CI with multi-platform support (#171)
- Improved LMDB embedding concurrency with per-thread handles (#180)
Full Changelog: v0.3.4...v0.4.1
Version 0.3.4
- support multiple GPU training/inference (
--multi-gpuparameter) - support safetensors model weights format
- support private HuggingFace models
- in application scripts: add
max-epochparameter, learning rate parameter - add grobid model for funding and acknowledgement information
- more parameter information printed when training
- some dependency updates
Version 0.3.3
with PyPI:
pip install delft==0.3.3
- support for incremental training
- fix SciBERT tokenier initialization from HuggingFace model
- updated HuggingFace transformers library to 4.25.1 and tensorflow to 2.9.3
- review the support of BPE tokenizers in the case of pre-tokenized input with the updated transformers library for most transformer models using it (tested with Roberta/GPT2, CamemBERT, bart-base, albert-base-v2, and XLM)
- addition of some model variants for sequence labeling (BERT_FEATURES, BERT_ChainCRF_FEATURES)
Version 0.3.2
with PyPI:
pip install delft==0.3.2
- Print model parameters at creation and load time
- Dataset recognition
- Model updates
- Set feature channel embeddings trainable
Full Changelog: v0.3.1...v0.3.2
Version 0.3.1
- fix a problem with CRF tensorflow-addons when batch size is 1
Version 0.3.0
- Migration of DeLFT to TensorFlow 2.7
- Support of HuggingFace transformer library (auto* library)
- New architectures and updated models
- General usage of optimizer with learning rate decay
- Updated docs now via readthedoc
- Improved ELMo embeddings
- Transformers wrapper to limit usage of Hugging Face hub only necessary, model with transformer layer fully portable without hub access
Version 0.2.6
- add automatic download of embeddings if not locally available
- enable embedding preload script for docker image
Version 0.2.5
- fix serialization of models with feature preprocessor (PR #110)
- update grobid models with features
- some other models and score updates
- add "software was used" classification model for software citations
- update tensorflow dependency