Skip to content

feat(contextual-prefix): add local ollama prefix tier (tier 2.5)#63

Open
vinsocci wants to merge 1 commit into
AgriciDaniel:mainfrom
vinsocci:feat/ollama-prefix-tier
Open

feat(contextual-prefix): add local ollama prefix tier (tier 2.5)#63
vinsocci wants to merge 1 commit into
AgriciDaniel:mainfrom
vinsocci:feat/ollama-prefix-tier

Conversation

@vinsocci

Copy link
Copy Markdown

Summary

contextual-prefix.py shipped three prefix-generation tiers:

  1. anthropic-api--allow-egress gated, sends page bodies off-machine (~$12/1k docs)
  2. claude-cli--allow-egress gated, sends page bodies off-machine (free via CC subscription)
  3. synthetic — on-machine, template-based, lower quality

For users who already run a local LLM, there was no way to get LLM-quality contextual prefixes without egress — a gap between "good but leaves the machine" and "private but weak."

This adds a 4th tier between claude-cli and synthetic:

tier 2.5 "ollama" — POST each chunk to a local ollama /api/chat with the page body for context. Default-on when ollama is reachable; no flag needed for the common localhost case.

Egress posture (consistent with the existing model)

  • Local ollama (127.0.0.1 / localhost / ::1) needs no flag.
  • Non-localhost OLLAMA_URL requires --allow-remote-ollama, mirroring the scripts/tiling-check.py:351 default-deny precedent.
  • --no-ollama skips the tier even when reachable.
  • Asymmetric fallback: on ollama failure the tier drops to synthetic and does not climb to claude-cli/anthropic-api — egress was never consented to on this path, so silently egressing on local-LLM failure would violate the user's posture.

API surface

  • New flags: --ollama-model (default qwen2.5:7b-instruct), --allow-remote-ollama, --no-ollama
  • New functions: ollama_url / ollama_is_local / ollama_reachable / ollama_prefix
  • pick_prefix_tier() and generate_prefix() extended; the anthropic-api and claude-cli paths are unchanged.

Test plan

Verified on macOS (Obsidian 1.12.7, ollama + qwen2.5:7b-instruct), all against this repo's own public demo vault:

Tier picker (49 demo pages, --peek):

invocation result
default (local ollama up) all 49 → tier=ollama
--no-ollama all 49 → tier=synthetic

Example prefixes (generated by this branch, tier=ollama):

  • wiki/concepts/Search Experience Optimization.md"This chunk outlines the methodology, process, and key innovation of SXO in SEO analysis."
  • wiki/concepts/Pro Hub Challenge.md"This chunk outlines the Pro Hub Challenge, detailing its structure and rules for both challenges."
  • wiki/concepts/cherry-picks.md"This chunk outlines Tier 1 features for quick implementation in the Cherry-Picks feature backlog."

Production scale: 895 chunks across 401 pages generated with tier=ollama, zero egress, no errors.

Notes

  • Pairs naturally with the existing ollama rerank stage (rerank.py), so the full contextual-retrieval pipeline (prefix → BM25 → cosine rerank) can run entirely on-machine.
  • Default model qwen2.5:7b-instruct is a suggestion, not a requirement — --ollama-model accepts any pulled tag (tested also with qwen2.5-coder:14b).

🤖 Generated with Claude Code

contextual-prefix.py shipped three tiers: anthropic-api and claude-cli
(both --allow-egress gated, both send page bodies off-machine) and
synthetic (on-machine, template-based, lower quality). For users with a
local LLM already running, there was no way to get LLM-quality contextual
prefixes WITHOUT egress — the gap between "good but leaves the machine"
and "private but weak."

This adds a 4th tier between claude-cli and synthetic:

  tier 2.5 "ollama" — POST each chunk to a local ollama /api/chat with the
  page body for context. Default-ON when ollama is reachable; no flag
  needed for the common (localhost) case.

Egress posture (consistent with the existing model):
  - Local ollama (127.0.0.1/localhost/::1) needs no flag.
  - Non-localhost OLLAMA_URL requires --allow-remote-ollama, mirroring the
    scripts/tiling-check.py:351 default-deny precedent.
  - --no-ollama skips the tier even if reachable.
  - Asymmetric fallback: on ollama failure the tier drops to synthetic and
    does NOT climb to claude-cli/anthropic-api — egress was never consented
    to on this path, so silently egressing on local-LLM failure would
    violate the user's posture.

New flags: --ollama-model (default qwen2.5:7b-instruct), --allow-remote-ollama,
--no-ollama. New functions: ollama_url/ollama_is_local/ollama_reachable/
ollama_prefix. pick_prefix_tier() and generate_prefix() extended; the
"anthropic-api" and "claude-cli" paths are unchanged.

Verified on macOS (Obsidian 1.12.7, ollama + qwen2.5:7b-instruct):
- tier picker over 49 public demo pages: default -> all tier=ollama;
  --no-ollama -> all tier=synthetic.
- example prefix (wiki/concepts/Search Experience Optimization.md):
  "This chunk outlines the methodology, process, and key innovation of
  SXO in SEO analysis."
- production scale: 895 chunks across 401 pages generated with tier=ollama,
  zero egress, no errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vinsocci vinsocci requested a review from AgriciDaniel as a code owner May 29, 2026 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant