Skip to content

v0.2.0 — QK LoRA & Performance Optimization

Latest

Choose a tag to compare

@Qingfeng-233 Qingfeng-233 released this 07 Apr 09:47
· 8 commits to main since this release

What's New in v0.2.0

QK LoRA Keyword Extraction

  • QK LoRA adapter trained on Qwen3-Embedding-0.6B (Q/K projection LoRA fine-tuning)
  • Training data: CSL 2000 + ShenCeCup 800 + Multi-Domain ~7995 = 10769 samples (extractive filtered)
  • ShenCeCup (1000 docs): F1@5=0.4653, F1@10=0.3292, R@10=0.7325
  • Outperforms Gemini flash-lite by +14% F1@10, +23% R@10, ~500x faster (0.02s vs 11s per doc)

Performance Optimization

  • �uild_model_bundle now accepts dtype parameter ('bfloat16' / 'float16' / 'float32' / 'auto')
  • KeyAttenExtractor passes dtype through to model loading
  • All forward passes upgraded from orch.no_grad() to orch.inference_mode()
  • Dynamic max_length from model.config.max_position_embeddings (no more hardcoded 512)
  • bf16 + SDPA: ~50% memory reduction for long document inference

New Features (since v0.1.0)

  • Decoder-only causal attention adaptation (auto layer recommendation)
  • Token-span candidate scoring (candidate_scoring='token_span')
  • Gravity candidates for unseen keyphrases (�nable_gravity=True)
  • Optional nested dedup for top-5 results
  • External token input and domain dictionary support
  • Length bias parameter for academic scenarios

Benchmark Highlights

Method Dataset F1@5 F1@10 R@10
QK LoRA (sigmoid) ShenCeCup 1000 0.4653 0.3292 0.7325
Gemini flash-lite ShenCeCup 1000 0.4006 0.2894 0.5973

Acknowledgments

Thanks to the LinuxDo community for their support.