|
| 1 | +# Changelog |
| 2 | + |
| 3 | +All notable changes to this project will be documented in this file. |
| 4 | + |
| 5 | +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), |
| 6 | +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). |
| 7 | + |
| 8 | +## [Unreleased] - 3.2.5-5 |
| 9 | + |
| 10 | +### Added |
| 11 | +- **Probability-based Sorting**: Annotations now sorted in descending order by probability value |
| 12 | +- **Smart Lemma Handling**: |
| 13 | + - Lemmas are paired with their corresponding POS tags and sorted together |
| 14 | + - Automatic deduplication when all lemmas are identical (e.g., `die|die|die` → `die`) |
| 15 | + - Preserves all lemmas when different to show tag-lemma relationships |
| 16 | +- **CI/CD Improvements**: |
| 17 | + - GitHub Actions workflow for automated testing |
| 18 | + - GitLab CI pipeline with test, build, and deploy stages |
| 19 | + - Automated tests for sorting and lemma deduplication |
| 20 | + - Docker Hub deployment workflow |
| 21 | + |
| 22 | +### Changed |
| 23 | +- **Docker Configuration**: |
| 24 | + - Updated to use `docker:latest` with DNS configuration for GitLab runners |
| 25 | + - Added `FF_NETWORK_PER_BUILD` variable for network isolation |
| 26 | +- **Docker Login**: Updated to use `--password-stdin` for secure authentication |
| 27 | + |
| 28 | +## [3.2.5-4] - 2025-11-25 |
| 29 | + |
| 30 | +### Added |
| 31 | +- **Rust Implementation**: Replaced Perl post-processing scripts with optimized Rust implementation |
| 32 | + - `korap-treetagger-processor` binary with three subcommands: `preprocess`, `postprocess`, `filter-german` |
| 33 | + - Significant performance improvements through buffered I/O |
| 34 | + - Identical output to original Perl scripts |
| 35 | +- **Model Management**: |
| 36 | + - Docker volume support for persistent model storage at `/local/models` |
| 37 | + - Automatic model download and caching |
| 38 | + - Graceful fallback to ephemeral storage if volume is not writable |
| 39 | +- **Probability Output**: Added `-p` flag to output probability values in MISC column |
| 40 | + |
| 41 | +### Changed |
| 42 | +- **Docker Image Name**: Renamed from `korap/conllu2treetagger` to `korap/conllu-treetagger` |
| 43 | +- **Build System**: Updated to use `make build-docker` command |
| 44 | +- **Logging**: All informational messages redirected to stderr to keep stdout clean for data processing |
| 45 | +- **Model Installation**: Patched `install-tagger.sh` to suppress "File exists" warnings |
| 46 | + |
| 47 | +### Removed |
| 48 | +- Pre-bundled language models from Docker image (for copyright compliance and reduced image size) |
| 49 | +- Perl dependencies for post-processing (replaced with Rust) |
| 50 | + |
| 51 | +### Fixed |
| 52 | +- Filter-german pass-through issues for comments and empty lines |
| 53 | +- Language installation warnings during Docker build |
| 54 | +- Mutable borrow errors in Rust implementation |
| 55 | + |
| 56 | +### Performance |
| 57 | +- Reduced system time through buffered I/O operations |
| 58 | +- Faster processing through compiled Rust code vs. interpreted Perl |
| 59 | + |
| 60 | +## [3.2.5-3] - Previous Version |
| 61 | + |
| 62 | +### Initial KorAP Fork |
| 63 | +- Forked from [sfischer13/docker-treetagger](https://github.com/sfischer13/docker-treetagger) |
| 64 | +- Added CoNLL-U format support |
| 65 | +- Integrated with KorAP pipeline |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## Credits |
| 70 | + |
| 71 | +- **Original Author**: [Stefan Fischer](https://github.com/sfischer13) - docker-treetagger |
| 72 | +- **TreeTagger**: [Helmut Schmid](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) |
| 73 | +- **KorAP Enhancements**: Marc Kupietz and contributors |
0 commit comments