What's New in v0.2.0
QK LoRA Keyword Extraction
- QK LoRA adapter trained on Qwen3-Embedding-0.6B (Q/K projection LoRA fine-tuning)
- Training data: CSL 2000 + ShenCeCup 800 + Multi-Domain ~7995 = 10769 samples (extractive filtered)
- ShenCeCup (1000 docs): F1@5=0.4653, F1@10=0.3292, R@10=0.7325
- Outperforms Gemini flash-lite by +14% F1@10, +23% R@10, ~500x faster (0.02s vs 11s per doc)
Performance Optimization
- �uild_model_bundle now accepts dtype parameter ('bfloat16' / 'float16' / 'float32' / 'auto')
- KeyAttenExtractor passes dtype through to model loading
- All forward passes upgraded from orch.no_grad() to orch.inference_mode()
- Dynamic max_length from model.config.max_position_embeddings (no more hardcoded 512)
- bf16 + SDPA: ~50% memory reduction for long document inference
New Features (since v0.1.0)
- Decoder-only causal attention adaptation (auto layer recommendation)
- Token-span candidate scoring (candidate_scoring='token_span')
- Gravity candidates for unseen keyphrases (�nable_gravity=True)
- Optional nested dedup for top-5 results
- External token input and domain dictionary support
- Length bias parameter for academic scenarios
Benchmark Highlights
| Method | Dataset | F1@5 | F1@10 | R@10 |
|---|---|---|---|---|
| QK LoRA (sigmoid) | ShenCeCup 1000 | 0.4653 | 0.3292 | 0.7325 |
| Gemini flash-lite | ShenCeCup 1000 | 0.4006 | 0.2894 | 0.5973 |
Acknowledgments
Thanks to the LinuxDo community for their support.