Highlights
The v0.4 line rewires what gets read, what gets cited, and how it gets organised — same total context size, far better signal density, and citations that actually work in the UI.
- 🎯 Pages are ranked and budgeted by relevance. BM25 scores every fetched page against your question. The most relevant pages get up to 3× the default character allowance; marginal pages shrink; off-topic noise (a Python tutorial that snuck into an autonomous-vehicles query) is dropped entirely.
- 🔗 Citations that always resolve. Inline
[N]markers in the model's reply now map 1:1 to the source pills in the OWUI sidebar — no more dangling[3], no more[REF]…[/REF]improvisations from models like Mistral, no more pool-only snippets cited with invented slugs. - 🧠 Better-structured answers across small and reasoning models. Tested on Mistral 24B, Gemma3 27B, and Qwen3 thinking. Replies are noticeably more comprehensive after the prompt was restructured around how transformer attention actually behaves on long contexts.
- 🎛️ New admin valve
inject_snippet_pool(default ON) — flip OFF to keep the LLM context tight to fully-fetched pages only.
In practice
For a ??:10 <question>, the two most relevant pages can each carry 8–12k characters of real body text while marginally-related ones shrink to ~200-char snippets and irrelevant ones disappear — instead of everyone getting a flat 4k slice. The answer you get back is noticeably more focused, and every cited source is one click away in the OWUI sidebar.
Other changes / fixes
Budget preservation across drops, surplus reclamation, reasoning-model reply-length accounting, prompt internals, User-Agent rotation pool expansion (20 → 40), debug-gated stats line, and more — see the full CHANGELOG for the complete list.
Upgrade
Drop the new easysearch.py into your OWUI Functions panel. No config migration needed: existing valves keep their values; the new inject_snippet_pool defaults to ON (matches v0.4.2 behaviour but with BM25 filtering).