Skip to content

Releases: kermitt2/delft

v0.4.3

19 Mar 13:30
740543b

Choose a tag to compare

What's Changed

Full Changelog: v0.4.2...v0.4.3

v0.4.2

19 Mar 13:30
51aae29

Choose a tag to compare

What's Changed

  • fix: ensure learning rate schedule handles tensor steps correctly by @lfoppiano in #204
  • Make the documentation working again! by @lfoppiano in #205
  • Handle empty or missing directory for embeddings by @lfoppiano in #206

Full Changelog: v0.4.1...v0.4.2

v0.4.1

04 Mar 19:55
c968d1e

Choose a tag to compare

Breaking changes

  • TensorFlow 2.17 / tf_keras 2.17: DeLFT now requires TensorFlow 2.17.1 and the standalone tf_keras 2.17.0 package. All Keras imports updated from tensorflow.keras to tf_keras. Pre-trained model weights from 0.3.x are not compatible and must be retrained. (#180, #181)
  • Python 3.10+ required: Python 3.8 and 3.9 are no longer supported. (#180)
  • CUDA 12.1 required for GPU: On Linux, torch is no longer in the base install to avoid CUDA conflicts. Use pip install "delft[gpu]" with the PyTorch cu121 index. (#195)
  • LMDB embedding format changed: Embeddings stored as raw float32 bytes instead of pickle. Enables Java interoperability (GROBID). Convert with: python -m delft.utilities.convert_lmdb_embeddings --input <old> --output <new> (#180)
  • ELMo support removed: Use transformer-based or static embeddings instead. (#190, #192)

New features

  • Weights & Biases integration for experiment tracking (--wandb flag) (#172)
  • Distributed training support via SLURM scripts (#191)
  • Configurable num_workers parameter for data loading (defaults to CPU count - 1) (#183)
  • BPE preprocessor auto-detection (#184)
  • Safetensors format support for local models
  • Licenses/copyrights classifier application
  • LMDB format validation and conversion utility (#180)
  • CI release pipeline with automated PyPI publishing (#196)
  • Contributing guidelines (CONTRIBUTING.md) (#200)
  • Code reformatting and lint enforcement with ruff (#197)

Bug fixes

  • Fixed crash when accessing learning rate during training (#180)
  • Fixed multiprocessing issues on macOS (#180)
  • Fixed empty embeddings handling with additional validation checks (#180)
  • Fixed resource-registry.json now bundled as package data (no more hardcoded path) (#201, #202)
  • Fixed word2vec embedding URL (now uses HuggingFace for GloVe instead of Stanford) (#180)
  • Fixed compatibility with newer versions of transformers library (#180)

Other changes

  • Migrated build config from setup.py to pyproject.toml (#195)
  • Retrained all models with TensorFlow 2.17 (#191)
  • Updated dependencies (transformers 4.48, torch 2.5.1, numpy 1.26.4, scikit-learn 1.6.1, pandas 2.2.3)
  • Improved GitHub Actions CI with multi-platform support (#171)
  • Improved LMDB embedding concurrency with per-thread handles (#180)

Full Changelog: v0.3.4...v0.4.1

Version 0.3.4

29 Nov 16:36

Choose a tag to compare

  • support multiple GPU training/inference (--multi-gpu parameter)
  • support safetensors model weights format
  • support private HuggingFace models
  • in application scripts: add max-epoch parameter, learning rate parameter
  • add grobid model for funding and acknowledgement information
  • more parameter information printed when training
  • some dependency updates

Version 0.3.3

12 Feb 08:51

Choose a tag to compare

with PyPI:

pip install delft==0.3.3
  • support for incremental training
  • fix SciBERT tokenier initialization from HuggingFace model
  • updated HuggingFace transformers library to 4.25.1 and tensorflow to 2.9.3
  • review the support of BPE tokenizers in the case of pre-tokenized input with the updated transformers library for most transformer models using it (tested with Roberta/GPT2, CamemBERT, bart-base, albert-base-v2, and XLM)
  • addition of some model variants for sequence labeling (BERT_FEATURES, BERT_ChainCRF_FEATURES)

Version 0.3.2

08 Dec 22:36

Choose a tag to compare

with PyPI:

pip install delft==0.3.2
  • Print model parameters at creation and load time
  • Dataset recognition
  • Model updates
  • Set feature channel embeddings trainable

Full Changelog: v0.3.1...v0.3.2

Version 0.3.1

18 Apr 12:36

Choose a tag to compare

  • fix a problem with CRF tensorflow-addons when batch size is 1

Version 0.3.0

29 Mar 18:12

Choose a tag to compare

  • Migration of DeLFT to TensorFlow 2.7
  • Support of HuggingFace transformer library (auto* library)
  • New architectures and updated models
  • General usage of optimizer with learning rate decay
  • Updated docs now via readthedoc
  • Improved ELMo embeddings
  • Transformers wrapper to limit usage of Hugging Face hub only necessary, model with transformer layer fully portable without hub access

Version 0.2.6

26 Dec 21:39

Choose a tag to compare

  • add automatic download of embeddings if not locally available
  • enable embedding preload script for docker image

Version 0.2.5

21 Dec 04:32

Choose a tag to compare

  • fix serialization of models with feature preprocessor (PR #110)
  • update grobid models with features
  • some other models and score updates
  • add "software was used" classification model for software citations
  • update tensorflow dependency