v0.2.0 — QK LoRA & Performance Optimization

Latest

Latest

Qingfeng-233 released this 07 Apr 09:47

· 8 commits to main since this release

f4ae506

What's New in v0.2.0

QK LoRA Keyword Extraction

QK LoRA adapter trained on Qwen3-Embedding-0.6B (Q/K projection LoRA fine-tuning)
Training data: CSL 2000 + ShenCeCup 800 + Multi-Domain ~7995 = 10769 samples (extractive filtered)
ShenCeCup (1000 docs): F1@5=0.4653, F1@10=0.3292, R@10=0.7325
Outperforms Gemini flash-lite by +14% F1@10, +23% R@10, ~500x faster (0.02s vs 11s per doc)

Performance Optimization

�uild_model_bundle now accepts dtype parameter ('bfloat16' / 'float16' / 'float32' / 'auto')
KeyAttenExtractor passes dtype through to model loading
All forward passes upgraded from orch.no_grad() to orch.inference_mode()
Dynamic max_length from model.config.max_position_embeddings (no more hardcoded 512)
bf16 + SDPA: ~50% memory reduction for long document inference

New Features (since v0.1.0)

Decoder-only causal attention adaptation (auto layer recommendation)
Token-span candidate scoring (candidate_scoring='token_span')
Gravity candidates for unseen keyphrases (�nable_gravity=True)
Optional nested dedup for top-5 results
External token input and domain dictionary support
Length bias parameter for academic scenarios

Benchmark Highlights

Method	Dataset	F1@5	F1@10	R@10
QK LoRA (sigmoid)	ShenCeCup 1000	0.4653	0.3292	0.7325
Gemini flash-lite	ShenCeCup 1000	0.4006	0.2894	0.5973

Acknowledgments

Thanks to the LinuxDo community for their support.

Assets 2