Skip to content

Commit d5fb082

Browse files
committed
Add GH CI workflow and changelog
Change-Id: Iad68bb637b3fd0293dda026d56cd45b211079ab9
1 parent 45373d5 commit d5fb082

File tree

3 files changed

+133
-2
lines changed

3 files changed

+133
-2
lines changed

.github/workflows/ci.yml

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [ main, master ]
6+
pull_request:
7+
branches: [ main, master ]
8+
9+
jobs:
10+
test-rust-processor:
11+
name: Test Rust Processor
12+
runs-on: ubuntu-latest
13+
14+
steps:
15+
- name: Checkout code
16+
uses: actions/checkout@v4
17+
18+
- name: Setup Rust
19+
uses: actions-rust-lang/setup-rust-toolchain@v1
20+
with:
21+
toolchain: 1.79
22+
23+
- name: Cache cargo registry
24+
uses: actions/cache@v4
25+
with:
26+
path: ~/.cargo/registry
27+
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
28+
29+
- name: Cache cargo index
30+
uses: actions/cache@v4
31+
with:
32+
path: ~/.cargo/git
33+
key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}
34+
35+
- name: Cache cargo build
36+
uses: actions/cache@v4
37+
with:
38+
path: korap-treetagger-processor/target
39+
key: ${{ runner.os }}-cargo-build-target-${{ hashFiles('**/Cargo.lock') }}
40+
41+
- name: Run cargo tests
42+
working-directory: korap-treetagger-processor
43+
run: cargo test --release
44+
45+
- name: Build release binary
46+
working-directory: korap-treetagger-processor
47+
run: cargo build --release
48+
49+
- name: Test probability sorting
50+
working-directory: korap-treetagger-processor
51+
run: |
52+
cat tests/resources/sort_input.txt | ./target/release/korap-treetagger-processor postprocess > tests/resources/sort_actual.txt
53+
diff tests/resources/sort_expected.txt tests/resources/sort_actual.txt
54+
55+
- name: Test lemma deduplication
56+
working-directory: korap-treetagger-processor
57+
run: |
58+
cat tests/resources/lemma_dedup_input.txt | ./target/release/korap-treetagger-processor postprocess > tests/resources/lemma_dedup_actual.txt
59+
diff tests/resources/lemma_dedup_expected.txt tests/resources/lemma_dedup_actual.txt

CHANGELOG.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased] - 3.2.5-5
9+
10+
### Added
11+
- **Probability-based Sorting**: Annotations now sorted in descending order by probability value
12+
- **Smart Lemma Handling**:
13+
- Lemmas are paired with their corresponding POS tags and sorted together
14+
- Automatic deduplication when all lemmas are identical (e.g., `die|die|die``die`)
15+
- Preserves all lemmas when different to show tag-lemma relationships
16+
- **CI/CD Improvements**:
17+
- GitHub Actions workflow for automated testing
18+
- GitLab CI pipeline with test, build, and deploy stages
19+
- Automated tests for sorting and lemma deduplication
20+
- Docker Hub deployment workflow
21+
22+
### Changed
23+
- **Docker Configuration**:
24+
- Updated to use `docker:latest` with DNS configuration for GitLab runners
25+
- Added `FF_NETWORK_PER_BUILD` variable for network isolation
26+
- **Docker Login**: Updated to use `--password-stdin` for secure authentication
27+
28+
## [3.2.5-4] - 2025-11-25
29+
30+
### Added
31+
- **Rust Implementation**: Replaced Perl post-processing scripts with optimized Rust implementation
32+
- `korap-treetagger-processor` binary with three subcommands: `preprocess`, `postprocess`, `filter-german`
33+
- Significant performance improvements through buffered I/O
34+
- Identical output to original Perl scripts
35+
- **Model Management**:
36+
- Docker volume support for persistent model storage at `/local/models`
37+
- Automatic model download and caching
38+
- Graceful fallback to ephemeral storage if volume is not writable
39+
- **Probability Output**: Added `-p` flag to output probability values in MISC column
40+
41+
### Changed
42+
- **Docker Image Name**: Renamed from `korap/conllu2treetagger` to `korap/conllu-treetagger`
43+
- **Build System**: Updated to use `make build-docker` command
44+
- **Logging**: All informational messages redirected to stderr to keep stdout clean for data processing
45+
- **Model Installation**: Patched `install-tagger.sh` to suppress "File exists" warnings
46+
47+
### Removed
48+
- Pre-bundled language models from Docker image (for copyright compliance and reduced image size)
49+
- Perl dependencies for post-processing (replaced with Rust)
50+
51+
### Fixed
52+
- Filter-german pass-through issues for comments and empty lines
53+
- Language installation warnings during Docker build
54+
- Mutable borrow errors in Rust implementation
55+
56+
### Performance
57+
- Reduced system time through buffered I/O operations
58+
- Faster processing through compiled Rust code vs. interpreted Perl
59+
60+
## [3.2.5-3] - Previous Version
61+
62+
### Initial KorAP Fork
63+
- Forked from [sfischer13/docker-treetagger](https://github.com/sfischer13/docker-treetagger)
64+
- Added CoNLL-U format support
65+
- Integrated with KorAP pipeline
66+
67+
---
68+
69+
## Credits
70+
71+
- **Original Author**: [Stefan Fischer](https://github.com/sfischer13) - docker-treetagger
72+
- **TreeTagger**: [Helmut Schmid](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
73+
- **KorAP Enhancements**: Marc Kupietz and contributors

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
# *TreeTagger* Docker Image with CoNLL-U Support
22

3-
[![Docker Build Status](https://img.shields.io/docker/cloud/build/korap/conllu-treetagger.svg)](https://hub.docker.com/r/korap/conllu-treetagger)
43
[![Docker Pulls](https://img.shields.io/docker/pulls/korap/conllu-treetagger.svg)](https://hub.docker.com/r/korap/conllu-treetagger)
54
[![Docker Stars](https://img.shields.io/docker/stars/korap/conllu-treetagger.svg)](https://hub.docker.com/r/korap/conllu-treetagger)
6-
[![Docker Automated build](https://img.shields.io/docker/cloud/automated/korap/conllu-treetagger.svg)](https://hub.docker.com/r/korap/conllu-treetagger)
5+
[![CI](https://github.com/KorAP/CoNLL-U-Treetagger/actions/workflows/ci.yml/badge.svg)](https://github.com/KorAP/CoNLL-U-Treetagger/actions/workflows/ci.yml)
76

87
Docker image for **Helmut Schmid**'s [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) (based on [Stefan Fischer](https://github.com/sfischer13)'s [docker-treetagger](https://github.com/sfischer13/docker-treetagger)) with support for input and output in [CoNLL-U format](https://universaldependencies.org/format.html).
98

0 commit comments

Comments
 (0)