feat(contextual-prefix): add local ollama prefix tier (tier 2.5)#63
Open
vinsocci wants to merge 1 commit into
Open
feat(contextual-prefix): add local ollama prefix tier (tier 2.5)#63vinsocci wants to merge 1 commit into
vinsocci wants to merge 1 commit into
Conversation
contextual-prefix.py shipped three tiers: anthropic-api and claude-cli
(both --allow-egress gated, both send page bodies off-machine) and
synthetic (on-machine, template-based, lower quality). For users with a
local LLM already running, there was no way to get LLM-quality contextual
prefixes WITHOUT egress — the gap between "good but leaves the machine"
and "private but weak."
This adds a 4th tier between claude-cli and synthetic:
tier 2.5 "ollama" — POST each chunk to a local ollama /api/chat with the
page body for context. Default-ON when ollama is reachable; no flag
needed for the common (localhost) case.
Egress posture (consistent with the existing model):
- Local ollama (127.0.0.1/localhost/::1) needs no flag.
- Non-localhost OLLAMA_URL requires --allow-remote-ollama, mirroring the
scripts/tiling-check.py:351 default-deny precedent.
- --no-ollama skips the tier even if reachable.
- Asymmetric fallback: on ollama failure the tier drops to synthetic and
does NOT climb to claude-cli/anthropic-api — egress was never consented
to on this path, so silently egressing on local-LLM failure would
violate the user's posture.
New flags: --ollama-model (default qwen2.5:7b-instruct), --allow-remote-ollama,
--no-ollama. New functions: ollama_url/ollama_is_local/ollama_reachable/
ollama_prefix. pick_prefix_tier() and generate_prefix() extended; the
"anthropic-api" and "claude-cli" paths are unchanged.
Verified on macOS (Obsidian 1.12.7, ollama + qwen2.5:7b-instruct):
- tier picker over 49 public demo pages: default -> all tier=ollama;
--no-ollama -> all tier=synthetic.
- example prefix (wiki/concepts/Search Experience Optimization.md):
"This chunk outlines the methodology, process, and key innovation of
SXO in SEO analysis."
- production scale: 895 chunks across 401 pages generated with tier=ollama,
zero egress, no errors.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
contextual-prefix.pyshipped three prefix-generation tiers:--allow-egressgated, sends page bodies off-machine (~$12/1k docs)--allow-egressgated, sends page bodies off-machine (free via CC subscription)For users who already run a local LLM, there was no way to get LLM-quality contextual prefixes without egress — a gap between "good but leaves the machine" and "private but weak."
This adds a 4th tier between claude-cli and synthetic:
Egress posture (consistent with the existing model)
127.0.0.1/localhost/::1) needs no flag.OLLAMA_URLrequires--allow-remote-ollama, mirroring thescripts/tiling-check.py:351default-deny precedent.--no-ollamaskips the tier even when reachable.API surface
--ollama-model(defaultqwen2.5:7b-instruct),--allow-remote-ollama,--no-ollamaollama_url/ollama_is_local/ollama_reachable/ollama_prefixpick_prefix_tier()andgenerate_prefix()extended; theanthropic-apiandclaude-clipaths are unchanged.Test plan
Verified on macOS (Obsidian 1.12.7, ollama +
qwen2.5:7b-instruct), all against this repo's own public demo vault:Tier picker (49 demo pages,
--peek):tier=ollama--no-ollamatier=syntheticExample prefixes (generated by this branch,
tier=ollama):wiki/concepts/Search Experience Optimization.md→ "This chunk outlines the methodology, process, and key innovation of SXO in SEO analysis."wiki/concepts/Pro Hub Challenge.md→ "This chunk outlines the Pro Hub Challenge, detailing its structure and rules for both challenges."wiki/concepts/cherry-picks.md→ "This chunk outlines Tier 1 features for quick implementation in the Cherry-Picks feature backlog."Production scale: 895 chunks across 401 pages generated with
tier=ollama, zero egress, no errors.Notes
rerank.py), so the full contextual-retrieval pipeline (prefix → BM25 → cosine rerank) can run entirely on-machine.qwen2.5:7b-instructis a suggestion, not a requirement —--ollama-modelaccepts any pulled tag (tested also withqwen2.5-coder:14b).🤖 Generated with Claude Code