-
2026-04-23: v0.4.3 - Citation Coherence, BM25 Pipeline Refinements & Admin Valves (Hannibal)
- fix: Source label format unified to
--- [N] Title ---so the block label maps 1:1 to the inline[N]citation marker the model is asked to emit. Removes the cognitive jumpSource N → [N]that small models (Gemma3, smaller Mistral, Phi) were skipping, producing inline markers that mapped to nothing. The<search_results>XML wrapper and the SECURITY instruction from v0.3.5 are preserved. - fix:
redistribute_budgetwas adding score=0.0 sources to the "hungry" list, causing them to receive surplus budget via the equal-share fallback — a gocsgo.com and csgoskins.gg page were getting 5937/6080 chars instead of the 200-char floor. Sources withscore=0.0are now excluded from surplus redistribution entirely; they stay hard-capped atBM25_FLOOR_CHARSregardless of available surplus. - fix: Zero-score fetched sources (no BM25 overlap with the user query) are now dropped entirely in Phase B instead of being kept at the 200-char floor. Even floor-capped blocks still produced a
[N]context block and a citation event, so the LLM would try to weave off-topic content into the answer (e.g. Zhihu "Attention is All You Need" threads and Pythonselftutorials appearing in a reply about autonomous vehicles). Degenerate case where all scores are zero falls through to the existing flat equal-share branch so the model still receives some context. - fix: BM25 budget is now computed against the pre-drop source count, so the surviving sources inherit the budget that would have been spent on the dropped noise. Previously dropping zero-score sources silently shrunk
total_budget = max_len * len(sources)proportional to the number of survivors, leaving the good sources with less context window than the user'smax_result_lengthvalve implied. - fix: Snippet-only pool (
remaining_pool) is now passed through the same BM25 + zero-score filter as the fetched sources before being injected into the LLM context. Previously the pool reached the prompt unfiltered, leaking ~40+ off-topic results per query (Pythonselftutorials, Arabic Hamza pages, random topic news) — both into the model's reasoning material and into the UI's citation panel as ~50 noisy pills. Survivors are rendered as--- [N] Title (snippet only) ---, contiguous numbering after fetched sources, each with its ownemit_citation. No hard cap — the pool self-limits via lexical pertinence. - fix: Reply-length stat (
📊 ... → Nk reply) now strips<think>,<thinking>and OWUI's<details type="reasoning" done="true">…</details>wrappers before counting, so reasoning models (Qwen3 thinking, DeepSeek-R1, Phi-reasoning) report the visible answer length, not the visible answer + hidden chain-of-thought. - changed: Prompt structure split by function. Task-framing sections (
INSTRUCTION,CRITICAL,RELIABILITY) move before<search_results>so the model reads the context already knowing the goal; output-format sections (CITATIONS,SECURITY) move after<search_results>so they are the last tokens before generation, fighting "lost in the middle" attention loss on 40k+ token contexts.INSTRUCTIONwas also strengthened with "Provide a comprehensive, well-structured response that synthesises the key findings", which restored verbosity that was being throttled when all rules sat at the head of the prompt. - added: Admin valve
inject_snippet_pool(defaultTrue) — when OFF, the BM25-ranked snippet-only pool is dropped from the LLM context and only fully-fetched pages reach the model. Useful for short factual lookups where pool clutter outweighs grounding signal; recommended ON for analytical/exploratory queries. - changed: User-Agent rotation pool expanded from 20 to 40 unique strings, doubling fingerprint diversity to further reduce per-domain rotation collisions on heavy
??:Nqueries. - changed: The
📊 ... raw → lxml → clean → ctx → replystats line is now appended to the model response only when the admindebugvalve is ON. With debug OFF (the default) it goes only to the EasySearch debug log. Removes operator-only telemetry from the end-user view without adding a dedicated valve — the existing debug toggle already gates it.
- fix: Source label format unified to
-
2026-04-23: v0.4.2 - OWUI Compatibility & Citation Fixes (Hannibal)
- fix:
Users.get_user_by_id()is synchronous in OWUI ≤0.8.12 and asynchronous in OWUI ≥0.9.x. Using a bareawaiton the sync version raisesobject UserModel can't be used in 'await' expression. Replaced all three call sites with a_get_user()helper that usesinspect.isawaitable()to branch at runtime — no minimum OWUI version requirement. - fix: Snippet-only
remaining_poolentries were assigned[N]IDs in the search context but had no correspondingemit_citationcall. OWUI silently drops inline[N]markers with no registered citation, so any fact the model cited from a snippet-only source would produce a dangling marker. All pool entries now emit a citation event.
- fix:
-
2026-04-23: v0.4.1 - Engine-Aware Result Cap (Hannibal)
- enh: Raised
max_results_per_queryupper bound from 50 to 100 so admins on SerpAPI, Exa, and Serper (which allow up to 100 per request) can make full use of their engine. - docs: Valve description is now engine-agnostic and lists the actual per-engine API caps for quick reference.
- enh: Raised
-
2026-04-23: v0.4.0 - BM25 Reranking + Dynamic Budget + Reasoning-Model Support (Hannibal)
- fix: Strip
<think>...</think>and<thinking>...</thinking>blocks from LLM responses before parsing. Reasoning models (phi-reasoning, deepseek-r1, qwen3 thinking mode) emit these blocks inline; without stripping them, JSON parsing in query generation fails and the fallback line-splitter turns reasoning text into search queries. Also handles truncated/unclosed<think>openers defensively. Fix applied to both_generate_queriesand_extract_query_from_context. - fix: Inline citations were silently dropped by most models. Source blocks were labelled
--- Source N: Title ---while the model was asked to cite as[N]— requiring a cognitive mapping step that recent models increasingly skip, especially in the presence of theSECURITY:instruction that tells them to ignore content inside<search_results>. Switched source labelling to[N] Title(matching the citation format 1:1) and rewrote theCITATIONS:andSECURITY:instructions to be explicit and non-conflicting. Pattern aligned with mcp-webgate's summarizer, which has been verified to produce inline citations reliably across models. - feat: BM25 relevance reranking. Fetched pages are ranked by keyword-overlap relevance against your query, so the most relevant sources appear first in the model's context — where LLM attention is strongest.
- feat: Dynamic budget allocation. The per-source character budget is now proportional to BM25 score — highly relevant pages get up to
3 × max_result_lengthchars; marginal pages shrink to a 200-char floor. Total context size is unchanged vs. the previous flat allocation — the same budget is just distributed more intelligently. - feat: Surplus redistribution. Characters unused by failed fetches or thin pages are reclaimed and handed to high-scoring pages that would otherwise be truncated. No wasted budget.
- feat: New admin valve
enable_bm25_rerank(default:True) — single toggle for the entire block (ranking + allocation). Disable to restore the v0.3.x flat-allocation behavior. - enh:
WebSearchHandler._sanitize_textnow does cleaning only; truncation moved to the dynamic allocation phase. The noise-filter regex is compiled once at module load (NOISE_LINE_RE).
- fix: Strip
-
2026-04-23: v0.3.5 - Bug Fixes & Security (Hannibal)
- fix: Clamp
WEB_SEARCH_RESULT_COUNTto a configurablemax_results_per_queryvalve (default: 20). Brave Search API hard-rejectscount > 20with 422 for all plans. Closes #3. - fix: Guard race condition on shared Filter singleton — concurrent requests could reset
self.ctx = Nonemid-execution, causing'NoneType' has no attribute 'model'inDebugService.emit()andoutlet(). Closes #1. - fix: Added missing
awaitonUsers.get_user_by_idcalls — was returning coroutines instead of user objects, causing'coroutine' has no attribute 'role'. (PR #4 by @jaybill). Closes #5. - security: Wrapped raw web content in
<search_results>XML tags with explicit untrusted-data instruction to mitigate indirect prompt injection from malicious pages. Addresses #2.
- fix: Clamp
-
2026-03-09: v0.3.4 - Documentation & Logic Improvements (Hannibal)
- Added Troubleshooting & FAQ section to README.md.
- Added immediate validation of Open WebUI's global Web Search toggle at startup.
- Improved "Smart Default" logic with a dedicated Language Anchor.
- Upgraded text cleaning engine to automatically skip binary files (.pdf, .docx, .zip) and remove Unicode junk characters.
- All retrieved search snippets from unread pages are now fed directly to the LLM for better signal density.
-
2026-03-07: v0.3.3 - Documentation & Clarity (Hannibal)
- Added explicit clarification in README.md regarding the requirement of the global Web Search engine in the Admin Panel for initial result fetching.
-
2026-03-07: v0.3.2 - Early Exit & Syntax Fix (Hannibal)
-
Moved the Global Web Search check from
_execute_searchtoinlet(Phase 2) to fail early and save processing time if the feature is disabled. -
Fixed an
unterminated string literalsyntax error inDebugService.dumpcaused by unsupported multiline f-strings in older Python versions.
-
-
2026-02-28: v0.3.1 - Linguistic Precision & Binary Scrubber (Hannibal)
- Fixed an issue where context-aware queries (
??) would lose the original conversation language by implementing a strictLanguage Anchorusingmsg_list[-2]. - Fixed a bug in
_parse_triggerthat caused single-language modifiers (e.g.,??:de) to ignore theresponse_langconstraint. - Introduced a Pre-Fetch Blacklist in
_process_resultsto safely skip.pdf,.docx,.zipand other non-HTML binaries before establishing an HTTP connection. - Upgraded the text cleaner with an Aggressive Scrubber regex to annihilate C0/C1 control codes,
\ufffdreplacements, and Zero-Width/Bi-Directional characters from dirty DOM elements. - Fixed a bug in the Gap-Filler logic that caused the total source count to exceed the requested target.
- Fixed an issue where context-aware queries (
-
2026-02-28: v0.3.0 - Resilient Double-Layer Context (Hannibal)
- Implemented "Snippet-First" fallback logic in
_process_resultsto prevent signal loss. - Added oversampling pool injection: all retrieved snippets are now fed to the LLM for massive signal density.
- Enhanced System Prompt to enforce prioritization of Snippets over noisy/empty Scraped Content.
- Refactored result formatting to distinguish between 'Summary' and 'Full Content'.
- Implemented "Snippet-First" fallback logic in
-
2026-02-28: v0.2.9 - Multi-Language Synthesis & Smart Defaults (Hannibal)
- Implemented dual-language syntax
??:src>destfor decoupled search/response. - Added Smart Default logic: responses follow user prompt language unless overridden.
- Refactored
_parse_triggerto handle complex language tokens.
- Implemented dual-language syntax
-
2026-02-28: v0.2.8 - Anti-Bleed Injection (Hannibal)
- Implementation of the "Anti-Bleed" system. The model now strictly adheres to the language of your query or the explicit
:langmodifier - The LLM is now strictly forbidden from generating bibliographies, source lists, or URL repetitions at the end of the response
- Implementation of the "Anti-Bleed" system. The model now strictly adheres to the language of your query or the explicit
-
2026-02-28: v0.2.7 - Anti-Bleed Injection (Hannibal)
- Implemented "Anti-Bleed" system instruction: Added a critical directive to the final prompt that forces the LLM to respond in the exact same language as the user's search query, preventing linguistic drift caused by foreign-language search results.
-
2026-02-28: v0.2.6 - Deep Context & RAG Lockdown (Hannibal)
- Implemented colon-separated modifiers for context window depth (e.g.,
??:c3). - Added
default_context_countUser Valve to define default lookback for empty triggers. - Enhanced UI status messages to show the exact number of messages analyzed during context extraction.
- Implemented safe RAG/Retrieval lockdown in
inletwith mandatory restoration inoutlet. - Optimized User-Agent list (20 unique strings) by removing duplicates and diversifying browser fingerprints.
- Refined structural HTML cleaning using
lxmlwith explicit error reporting for missing dependencies.
- Implemented colon-separated modifiers for context window depth (e.g.,
-
2026-02-28: v0.2.5 - Full Logic Consolidation (Hannibal)
- Consolidated multi-modifier parsing logic (e.g., ??:en:10).
- Expanded User-Agent rotation to 20 unique strings for better stealth.
- Fixed pluralization in status messages (singular/plural 'page').
- Implemented strict lxml requirement with explicit UI error reporting.
- Verified and secured system prompt preservation during search cycles.
- Mapped search language and query data into the unified model for enhanced debugging.
-
2026-02-28: v0.2.4 - Multi-Modifier Syntax & Context Preservation (Hannibal)
- Added support for colon-separated modifiers in triggers (e.g.,
??:en:10,??:10:it). - Implemented System Prompt preservation to maintain model personality during search.
- Added regex-based fallback for HTML cleaning when lxml is unavailable or fails.
- Fixed a regression causing double search execution in the inlet phase.
- Implemented dynamic singular/plural logic for status recovery messages.
- Integrated search language tracking in the unified configuration model for better debugging.
- Added support for colon-separated modifiers in triggers (e.g.,
-
2026-02-28: v0.2.2 - Network Resilience & UA Rotation (Hannibal)
- Added configurable search timeout via Admin Valves.
- Implemented User-Agent rotation to prevent anti-scraping blocks.
- Improved concurrent fetching logic with dynamic headers.
-
2026-02-27: v0.2.1 - Search Resilience & UX Milestone (Hannibal)
- Implemented URL Sanitization to filter tracking parameters (UTM, etc.) and fragments.
- Added Oversampling logic in search execution to ensure a healthy candidate pool.
- Implemented Auto-Recovery Fetch (Gap-Filler) to maintain the requested source count.
- Moved search_prefix to UserValves for full user customization.
- Added "Thinking..." UI status emission at the end of the inlet phase.
- Integrated unified configuration mapping in ConfigService for all new valves.
-
2026-02-27: v0.1.5 - Dynamic Fetching & Safety Caps (Hannibal)
- Implemented dynamic calculation for
WEB_SEARCH_RESULT_COUNToverride: now usesmax(default, user_target)to prevent underfetching when few queries are generated. - Added
max_total_resultsValve (default: 20) as a hard safety cap to prevent user abuse (e.g.,??1000). - Refined
search_results_per_queryto act as a minimum guaranteed fetch count per query. - Updated logging to clearly show requested vs. capped vs. final target counts.
- Implemented dynamic calculation for
-
2026-02-27: v0.1.3 - Configuration Refactor & Anti-Flood (Hannibal)
- Refactored configuration management into ConfigService for centralized Valve merging.
- Implemented robust trigger parsing logic to handle
??Nmodifiers. - Added HTTPX streaming with
max_download_mblimit to prevent memory flooding. - Centralized Admin Valves and User Valves logic.
- Enhanced debug logging across all critical paths.
-
2026-02-27: v0.1.1 - Core Refactor & Deduplication (Hannibal)
- Added URL deduplication logic in
_process_resultsto prevent redundant scraping. - Optimized token usage by filtering out duplicate search results before LLM processing.
- Added URL deduplication logic in
-
2026-02-27: v0.1.0 - Core Refactor (Hannibal)
- Forked from EasyBrief: removed all formatting, visual generation, and brief logic.
- Implemented "Search Only" focus with
??trigger. - Added Context Extraction for empty
??trigger based on last assistant message. - Set Filter Priority to 999 to prevent pipeline breakage during history wiping.