Fix issue 1282 self-hosted model selection when billing is disabled by MujtabaHadi · Pull Request #1318 · khoj-ai/khoj

MujtabaHadi · 2026-04-23T08:19:42Z

Summary

Fix self-hosted model selection so chat_advanced does not override the default chat model path when billing is disabled.

Problem

On self-hosted Khoj, users could be treated as subscribed in the default-model selection path. As a result, chat_advanced could override normal model selection behavior, including the default chat model and user-selected model behavior.

Fix

Update:

ConversationAdapters.get_default_chat_model
ConversationAdapters.aget_default_chat_model

so the subscribed/advanced branch only applies when state.billing_enabled is true.

Validation

I reproduced the issue locally in a self-hosted Docker setup using Ollama and OpenAI-compatible routing.

Before the fix:

changing chat_advanced controlled the actual chat response path

After the fix:

helper/background steps used the advanced model
the actual chat response used the default model

Related issue

Fixes #1282

A previous regression resulted in the start llm response event being sent with every (non-thought) message chunk. It should only be sent once after thoughts and before first normal message chunk is streamed. Regression probably introduced with changes to stream thoughts. This should fix the chat streaming latency logs.

This is required by llama.cpp server and is recommended in general for openai compatible models

- Extract llm thoughts from more openai compatible ai api providers like llama.cpp server vllm and litellm. - Try structured thought extraction by default - Try in-stream thought extraction for specific model families like qwen and deepseek. - Show thoughts with tool use. For intermediate steps like research mode from openai compatible models Some consensus on thought in model response is being reached with using deepseek style thoughts in structured response (via "reasoning_content" field) or qwen style thoughts in main response (i.e <think></think> tags). Default to try deepseek style structured thought extraction. So the previous default stream processor isn't required.

Save to conversation in normal flow should only be done if interrupt wasn't triggered. Saving conversations on interrupt is handled completely by the disconnect monitor since the improvements to interrupt. This abort is handled correctly for steps before final response. But not if interrupt occurs while final response is being sent. This changes checks for cancellation after final response send attempt and avoids duplicate chat turn save.

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

## PR Summary This PR resolves the deprecation warnings of the Pydantic library, which you can find in the [CI logs](https://github.com/khoj-ai/khoj/actions/runs/16528997676/job/46749452047#step:9:142): ```python PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ ```

This should avoid the sync_to_async errors thrown by django when calling the /api/agent/conversation API endpoint

- Ask both manager and code gen AI to not run or write unsafe code for some safety improvement (over code exec in sandbox). - Disallow custom agent prompts instructing unsafe code gen

Grok 3 mini at least sends thoughts in reasoning_content field of streamed chunk delta. Extract model thoughts from that when available.

Send larger thought chunks to improve streaming efficiency and reduce rendering load on web client. This rendering load was most evident when using high throughput models or low compute clients. The server side message buffering should result in fewer re-renders, faster streaming and lower compute load on client. Related commit to buffer message content in fc99f8b

Clarify that the tool AI will perform a maximum of X sub-queries for each query passed to it by the manager AI. Avoids the manager AI from trying to directly pass a list of queries to the search tool AI. It should just pass just a single query.

These were used when khoj was configured using khoj.yml file

It is recommended to chat with open-source models by running an open-source server like Ollama, Llama.cpp on your GPU powered machine or use a commercial provider of open-source models like DeepInfra or OpenRouter. These chat model serving options provide a mature Openai compatible API that already works with Khoj. Directly using offline chat models only worked reasonably with pip install on a machine with GPU. Docker setup of khoj had trouble with accessing GPU. And without GPU access offline chat is too slow. Deprecating support for an offline chat provider directly from within Khoj will reduce code complexity and increase developement velocity. Offline models are subsumed to use existing Openai ai model provider.

This stale code was originally used to index files on server file system directly by server. We currently push files to sync via API. Server side syncing of remote content like Github and Notion is still supported. But old, unused code for server side sync of files on server fs is being cleaned out. New --log-file cli args allows specifying where khoj server should store logs on fs. This replaces the --config-file cli arg that was only being used as a proxy for deciding where to store the log file. - TODO - Tests are broken. They were relying on the server side content syncing for test setup

- Delete tests testing deprecated server side indexing flows - Delete `Local(Plaintext|Org|Markdown|Pdf)Config' methods, files and references in tests - Index test data via new helper method, `get_index_files' - It is modelled after the old `get_org_files' variants in main app - It passes the test data in required format to `configure_content' Allows maintaining the more realistic tests from before while using new indexing mechanism (rather than the deprecated server side indexing mechanism

…hoj-ai#1212) ### Overview Make server leaner to increase development speed. Remove old indexing code and the native offline chat which was hard to maintain. - The native offline chat module was written when the local ai model api ecosystem wasn't mature. Now it is. Reuse that. - Offline chat requires GPU for usable speeds. Decoupling offline chat from Khoj server is the recommended way to go for practical inference speeds (e.g Ollama on machine, Khoj in docker etc.) ### Details - Drop old code to index files on server filesystem. Clean cli, init paths. - Drop native offline chat support with llama-cpp-python. Use established local ai APIs like Llama.cpp Server, Ollama, vLLM etc. - Drop old pre 1.0 khoj config migration scripts - Update test setup to index test data after old indexing code removed.

- Use khoj username on khoj's computer - Uv is much faster for builds

…lows It's much faster than pip, includes dependency locks via uv.lock and comes with standard convenience utilities (e.g pipx, venv replacement)

It's faster than yarn and comes with standard convenience utilities

…hoj-ai#1128) - When you type in search modal, and matches the pattern `file:`, you should see list of all files in vault and non-vault - This list is filtered down as you type more letters ### Technical Details - Added file filter mode (`isFileFilterMode` state) to filter search results by specific files - Updated `getSuggestions()` function to search file from vault and non-vault via khoj backend. - Updated the selection behavior to handle both file selection and search result selection Closes khoj-ai#1025 --------- Co-authored-by: Debanjum <debanjum@gmail.com>

Add a "Copy References" button to the references pane in the web app. In ReferencePanel Component - Add a "Copy References" button to the `ReferencePanel` component. - Implement functionality to copy all references (notes, online, and code) as a markdown bullet list. - Update the `TeaserReferencesSection` component to include the "Copy References" button. - Show copied to clipboard indicator when references copied on button click Closes khoj-ai#1021 --------- Co-authored-by: Debanjum <debanjum@gmail.com>

## Summary - Fixes AttributeError: 'str' object has no attribute 'iter_content' in text_to_speech endpoint - When `ELEVEN_LABS_API_KEY` is not configured, the function was returning a string instead of a Response object ## Changes - Introduced `TextToSpeechError` exception class in `text_to_speech.py` - Changed `generate_text_to_speech` to raise exception instead of returning error string - Updated API endpoint to catch the exception and return HTTP 501 (Not Implemented) ## Test plan - [x] Code passes ruff lint check - [ ] Manual testing with and without Eleven Labs API key configured Fixes khoj-ai#1049 --------- Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: Debanjum <debanjum@gmail.com>

Trailing slash in api calls to server doesn't work in production behind proxy, only in local next.js dev server.

…ibed

Fix spelling typos in telemetry.py. Corrects 'recieved' to 'received' and 'equest' to 'request' in comments and error messages.

…khoj-ai#1263) Remove redundant SDK version check in LauncherActivity since both branches set the same orientation value. This simplifies the code without changing behavior Signed-off-by: Olexandr88 <radole1203@gmail.com>

## Summary Fix a Python operator precedence bug in the `research()` function that causes `current_iteration` to be set to a boolean instead of the actual count of previous iterations. ## Bug ```python if current_iteration := len(previous_iterations) > 0: ``` Python evaluates this as: ```python if current_iteration := (len(previous_iterations) > 0): # assigns True or False ``` So `current_iteration` becomes `True` (1) or `False` (0) regardless of how many previous iterations exist. ## Fix ```python if (current_iteration := len(previous_iterations)) > 0: ``` With parentheses, `current_iteration` is correctly set to the count (e.g. 4), and then compared to 0. ## Impact When resuming research with previous iterations, the loop counter was effectively reset to 1 instead of the true count. This allowed the research loop to run significantly more iterations than `MAX_ITERATIONS` intended, wasting compute and API calls. Signed-off-by: JiangNan <1394485448@qq.com>

## Summary In `extract_from_webpage()`, the `content` parameter is unconditionally overwritten to `None` on the line before the `is_none_or_empty(content)` check. This means any pre-fetched content (e.g. text content already retrieved by the Exa search engine) is always discarded, forcing an unnecessary re-scrape of the webpage. ## Bug ```python async def extract_from_webpage( url: str, subqueries: set[str] = None, content: str = None, # <-- caller passes pre-fetched content ... ) -> Tuple[set[str], str, Union[None, str]]: content = None # <-- BUG: immediately overwrites it if is_none_or_empty(content): # always True content = await scrape_webpage_with_fallback(url) ``` ## Fix Remove the `content = None` assignment so the passed-in content is used when available, falling back to scraping only when needed. This bug was introduced in a refactor and causes: - Wasted API calls to web scrapers for pages whose content is already available - Increased latency for search results that include inline content (e.g. Exa) Signed-off-by: JiangNan <1394485448@qq.com>

…ai#1277) ## Problem When `ChatModel.friendly_name` is `None`, the `__str__` method returns `None`, causing: ``` TypeError: __str__ returned non-string (type NoneType) ``` ## Solution Fall back to `name` field when `friendly_name` is `None`. Related issue: khoj-ai#1251 Co-authored-by: 阳虎 <yanghu@yanghudeMacBook-Pro.local>

…g fails (khoj-ai#1292) When PyMuPDFLoader fails to process an invalid PDF file, the exception is caught but pdf_entry_by_pages is referenced before assignment, causing an UnboundLocalError. Initialized pdf_entry_by_pages to an empty list before the try block so the return statement always has a valid value, even when an exception occurs. Verified with both invalid input (returns []) and valid PDFs (returns extracted text). Fixes khoj-ai#1289 Co-authored-by: BillionClaw <267901332+BillionClaw@users.noreply.github.com>

## Summary `src/khoj/processor/content/org_mode/orgnode.py:57` opens a file with `open(filename, "r")` but never closes it. The file handle leaks for the lifetime of the returned `Orgnode` list. ## Fix Replaced bare `open()` with a `with` statement to ensure the file is closed after `makelist()` finishes reading. ```python # Before def makelist_with_filepath(filename): f = open(filename, "r") return makelist(f, filename) # After def makelist_with_filepath(filename): with open(filename, "r") as f: return makelist(f, filename) ``` This is safe because `makelist()` fully consumes the file during the call (building the Orgnode list from file contents), so the file handle is no longer needed after it returns.

Changes (4 files): - pyproject.toml: authlib 1.6.6 → 1.6.9 - src/interface/web/package.json: dompurify ^3.2.6 → ^3.3.2, eslint-config-next 14.2.3 → 14.2.35 - documentation/package.json: @docusaurus/* → ^3.9.2, added serialize-javascript resolution And regenerated lock files. The only resolution override is serialize-javascript in documentation, which is unavoidable since Docusaurus still pins old copy-webpack-plugin and css-minimizer-webpack-plugin that depend on serialize-javascript ^6.x.

- Add missing skipif decorator to test_create_automation - Change skip condition from 'is None' to 'not' (falsy check) to also handle empty string, which happens when GitHub secrets are unavailable in fork PRs

Add banner to home, chat, shared chat and settings pages for coverage. Link to settings account section to export data and mention Khoj self-host option in banner

…havior

Starlette 1.0.0 removed the deprecated TemplateResponse signature where `name` was the first positional arg and `request` was passed inside `context`. The new signature requires `request` as the first positional argument: TemplateResponse(request, name=...). This caused a 500 error in production on web client endpoints with: "Jinja2Templates.TemplateResponse() missing 1 required positional argument: 'name'" (with older Starlette) or "'request'" (with 1.0.0). Update all TemplateResponse calls in web_client.py to use the new Starlette 1.0.0 signature: pass `request` as the first positional arg and `name` as an explicit keyword argument. Issue didn't trigger locally as uv is used locally and pip in docker builds. These resolve dependencies including starletter version to install differently. Locally 0.52.0 was installed while on production starlette 1.0.0 was used. This is what caused the issue and the mismatch in expectation

…i#1296) ## Summary - Add null checks for `config.setting` in `get_chat_model()` and `aget_chat_model()` to prevent `AttributeError` when memories are disabled - When the memory toggle creates a `UserConversationConfig` via `get_or_create` with `setting=None`, accessing `config.setting.price_tier` crashes — now falls through to the default chat model instead ## Root Cause The "Enable Memories" toggle PATCH endpoint uses `get_or_create` on `UserConversationConfig`, which can create a config with `setting=None`. Both `get_chat_model()` and `aget_chat_model()` then crash: - For subscribed users: `if config:` passes but `return config.setting` returns `None`, causing downstream crashes - For non-subscribed users: `config.setting.price_tier` raises `AttributeError` on `None` ## Fix Change `if config:` → `if config and config.setting:` (subscribed path) and add `and config.setting` guard before `.price_tier` access (non-subscribed path), in both sync and async variants. ## Test plan - [ ] Toggle memories off with no prior chat model configured — settings page should still load - [ ] Chat responses should use default model when setting is None - [ ] Existing users with configured chat models should be unaffected Fixes khoj-ai#1287 Signed-off-by: majiayu000 <1835304752@qq.com>

debanjum and others added 30 commits July 25, 2025 13:28

Handle tool call requests with openai completion in non stream mode

03c4f61

Stricty enforce tool call schema for llm served via openai compat api

c401bb9

This is required by llama.cpp server and is recommended in general for openai compatible models

Expand to enable deep think for more qwen style models like smollm3

624d622

Bump desktop app and documentation dependencies

f5d12b7

Resolve Pydantic deprecation warnings

655a1b3

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

Make async call to get agent files from async agent/conversation API

6caa6f4

This should avoid the sync_to_async errors thrown by django when calling the /api/agent/conversation API endpoint

Redirect to a better error page on server error

bbc1495

Use Gemini suggested retry backoff if set. Improve gemini error handling

0f953f9

Make code tool write safe code to run in sandbox

6290d74

- Ask both manager and code gen AI to not run or write unsafe code for some safety improvement (over code exec in sandbox). - Disallow custom agent prompts instructing unsafe code gen

Release Khoj version 2.0.0-beta.11

7ab24d8

Use better, standard default temp, top_p for openai model providers

c0db9e4

Support grok 4 reasoning model

b335f8c

Extract thought stream from reasoning_content of openai model providers

fba4ad2

Grok 3 mini at least sends thoughts in reasoning_content field of streamed chunk delta. Extract model thoughts from that when available.

Release Khoj version 2.0.0-beta.12

9096f62

Drop old pre 1.0 khoj config migration scripts

3f8cc71

These were used when khoj was configured using khoj.yml file

Use portable comparator to get flags used to call dev_setup.sh

0387b86

Use UV to manage python version, env on khoj computer

e0f363d

- Use khoj username on khoj's computer - Uv is much faster for builds

Use UV to install server for speed, package locks in dev setup, workf…

006b958

…lows It's much faster than pip, includes dependency locks via uv.lock and comes with standard convenience utilities (e.g pipx, venv replacement)

Use Deno for speed, package locks in dev setup, github workflows

d2940de

It's faster than yarn and comes with standard convenience utilities

Use UV, Deno for faster setup of development container

8700fb8

samhoooo and others added 30 commits February 23, 2026 00:33

Release Khoj version 2.0.0-beta.25

94bae47

Drop trailing slash to get memories via api on web app in production

0b8cf51

Trailing slash in api calls to server doesn't work in production behind proxy, only in local next.js dev server.

only show the payment card on the settings page if the user is subscr…

b864cb1

…ibed

Update What's New in Readme - Mention Pipali release

aeea140

Update Pipali project announcement in README

17be2d4

Fix typos in telemetry error message and comment (khoj-ai#1265)

2c82967

Fix spelling typos in telemetry.py. Corrects 'recieved' to 'received' and 'equest' to 'request' in comments and error messages.

Skip automation tests when GEMINI_API_KEY is not set

51a56af

- Add missing skipif decorator to test_create_automation - Change skip condition from 'is None' to 'not' (falsy check) to also handle empty string, which happens when GitHub secrets are unavailable in fork PRs

Upgrade to Next.js 15 for web app

a9749c7

Use next Link instead of raw a html tags to wrap Khoj home logo link

b8f82b2

Show Khoj cloud deprecation banner on web app to Khoj cloud users

f7bce48

Add banner to home, chat, shared chat and settings pages for coverage. Link to settings account section to export data and mention Khoj self-host option in banner

Use next Link instead of raw a html tags to wrap more links on web app

d4df9a7

Ignore dup file errors for pypi wheel validation. Expected Next 15 be…

f356386

…havior

Release Khoj version 2.0.0-beta.26

b8797e0

Release Khoj version 2.0.0-beta.27

7475a78

Bump server, ciient dependencies

8965db7

Fix getting billing config to show deprecation banner on Khoj cloud

fdd5fd8

Add deprecation banner to top of web landing page as well

171ac5d

Release Khoj version 2.0.0-beta.28

9258f57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issue 1282 self-hosted model selection when billing is disabled#1318

Fix issue 1282 self-hosted model selection when billing is disabled#1318
MujtabaHadi wants to merge 330 commits into
khoj-ai:release/1.xfrom
MujtabaHadi:fix-1282-selfhosted-model-selection

MujtabaHadi commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants