Releases · kermitt2/delft · GitHub

19 Mar 13:30

lfoppiano

v0.4.3 Latest

Latest

What's Changed

Remove hardcoded registry file by @lfoppiano in #199

Full Changelog: v0.4.2...v0.4.3

Contributors

lfoppiano

Assets 2

19 Mar 13:30

lfoppiano

v0.4.2

What's Changed

fix: ensure learning rate schedule handles tensor steps correctly by @lfoppiano in #204
Make the documentation working again! by @lfoppiano in #205
Handle empty or missing directory for embeddings by @lfoppiano in #206

Full Changelog: v0.4.1...v0.4.2

Contributors

lfoppiano

Assets 2

04 Mar 19:55

lfoppiano

v0.4.1

Breaking changes

TensorFlow 2.17 / tf_keras 2.17: DeLFT now requires TensorFlow 2.17.1 and the standalone tf_keras 2.17.0 package. All Keras imports updated from tensorflow.keras to tf_keras. Pre-trained model weights from 0.3.x are not compatible and must be retrained. (#180, #181)
Python 3.10+ required: Python 3.8 and 3.9 are no longer supported. (#180)
CUDA 12.1 required for GPU: On Linux, torch is no longer in the base install to avoid CUDA conflicts. Use pip install "delft[gpu]" with the PyTorch cu121 index. (#195)
LMDB embedding format changed: Embeddings stored as raw float32 bytes instead of pickle. Enables Java interoperability (GROBID). Convert with: python -m delft.utilities.convert_lmdb_embeddings --input <old> --output <new> (#180)
ELMo support removed: Use transformer-based or static embeddings instead. (#190, #192)

New features

Weights & Biases integration for experiment tracking (--wandb flag) (#172)
Distributed training support via SLURM scripts (#191)
Configurable num_workers parameter for data loading (defaults to CPU count - 1) (#183)
BPE preprocessor auto-detection (#184)
Safetensors format support for local models
Licenses/copyrights classifier application
LMDB format validation and conversion utility (#180)
CI release pipeline with automated PyPI publishing (#196)
Contributing guidelines (CONTRIBUTING.md) (#200)
Code reformatting and lint enforcement with ruff (#197)

Bug fixes

Fixed crash when accessing learning rate during training (#180)
Fixed multiprocessing issues on macOS (#180)
Fixed empty embeddings handling with additional validation checks (#180)
Fixed resource-registry.json now bundled as package data (no more hardcoded path) (#201, #202)
Fixed word2vec embedding URL (now uses HuggingFace for GloVe instead of Stanford) (#180)
Fixed compatibility with newer versions of transformers library (#180)

Other changes

Migrated build config from setup.py to pyproject.toml (#195)
Retrained all models with TensorFlow 2.17 (#191)
Updated dependencies (transformers 4.48, torch 2.5.1, numpy 1.26.4, scikit-learn 1.6.1, pandas 2.2.3)
Improved GitHub Actions CI with multi-platform support (#171)
Improved LMDB embedding concurrency with per-thread handles (#180)

Full Changelog: v0.3.4...v0.4.1

Assets 2

29 Nov 16:36

kermitt2

Version 0.3.4

support multiple GPU training/inference (--multi-gpu parameter)
support safetensors model weights format
support private HuggingFace models
in application scripts: add max-epoch parameter, learning rate parameter
add grobid model for funding and acknowledgement information
more parameter information printed when training
some dependency updates

Assets 2

12 Feb 08:51

kermitt2

Version 0.3.3

with PyPI:

pip install delft==0.3.3

support for incremental training
fix SciBERT tokenier initialization from HuggingFace model
updated HuggingFace transformers library to 4.25.1 and tensorflow to 2.9.3
review the support of BPE tokenizers in the case of pre-tokenized input with the updated transformers library for most transformer models using it (tested with Roberta/GPT2, CamemBERT, bart-base, albert-base-v2, and XLM)
addition of some model variants for sequence labeling (BERT_FEATURES, BERT_ChainCRF_FEATURES)

Assets 2

08 Dec 22:36

lfoppiano

Version 0.3.2

with PyPI:

pip install delft==0.3.2

Print model parameters at creation and load time
Dataset recognition
Model updates
Set feature channel embeddings trainable

Full Changelog: v0.3.1...v0.3.2

Assets 2

18 Apr 12:36

kermitt2

Version 0.3.1

fix a problem with CRF tensorflow-addons when batch size is 1

Assets 2

29 Mar 18:12

kermitt2

Version 0.3.0

Migration of DeLFT to TensorFlow 2.7
Support of HuggingFace transformer library (auto* library)
New architectures and updated models
General usage of optimizer with learning rate decay
Updated docs now via readthedoc
Improved ELMo embeddings
Transformers wrapper to limit usage of Hugging Face hub only necessary, model with transformer layer fully portable without hub access

Assets 2

26 Dec 21:39

kermitt2

Version 0.2.6

add automatic download of embeddings if not locally available
enable embedding preload script for docker image

Assets 2

21 Dec 04:32

kermitt2

Version 0.2.5

fix serialization of models with feature preprocessor (PR #110)
update grobid models with features
some other models and score updates
add "software was used" classification model for software citations
update tensorflow dependency

Assets 2