Date: 2026-01-12
Status: Proposed
Group: integration
Live Conversational Threads currently uses multiple online LLM providers (Anthropic, OpenRouter, Gemini) across detectors, clustering, and transcript processing. The product direction prioritizes local inference for privacy, with optional online usage only when users explicitly opt in. We also need a consistent configuration surface for chat and embedding model selection, backed by environment defaults and database overrides.
- Default the application to local LLM mode, using an OpenAI-compatible LM Studio endpoint.
- Provide settings (env + DB override) for:
mode:localoronlinebase_url: LM Studio base URLchat_model: local chat model idembedding_model: local embedding model idjson_modeandtimeout_secondsfor safety/reliability
- Add a UI panel that surfaces both chat and embedding model dropdowns plus a mode toggle.
- Preserve online providers in code paths for optional use when
mode=online.
- Local mode becomes the default for all LLM calls (detectors, clustering, transcript processing).
- Online providers remain available but require explicit opt-in and valid API keys.
- LLM configuration is centralized for consistent behavior across services.
- Future refactors should split large LLM-heavy modules into smaller units without changing behavior.
- Local models incur no external usage cost and improve privacy.
- Online mode is still useful for higher-quality or specialized models and is gated by settings.