Skip to content

Commit 6ebe353

Browse files
PurpleDoubleDclaude
andcommitted
release: v2.3.8 — Codex end-to-end overhaul
Fixes - Codex file_write actually lands on disk — threaded chatId through all 6 built-in tool executors (fs_read/fs_write/fs_list/fs_search/shell_execute/ execute_code) via getActiveChatId(). The documented per-chat workspace isolation (~/agent-workspace/<chatId>/) silently fell through to a shared default/ fallback whenever the model emitted a relative path. - agents.ts:executeTool file_write now returns the real data.path from Rust instead of a fake "File written successfully" string that hid write failures behind a green ✓. - Codex chat bubble no longer floods with raw JSON for models that emit tool calls as content (qwen2.5-coder:3b). New stripRanges() helper uses the balanced-brace positions the extractor already computes to remove the exact tool-call substrings, and an extractedFromContent flag drops the residual narrative entirely so qwen's Codex UI looks identical to gemma4's. - Balanced-brace JSON extractor replaces the greedy \{[^}]*\} regex that failed on nested braces / f-strings. Fixes extractToolCallsFromContent for any code using f-strings or dict-literal string values. - Arg-validator error-hint now concrete: lists required fields with types + the keys the model actually sent, so small models self-correct on retry instead of repeating the same malformed call. Added - Context compaction in Codex (mirrors Agent Mode — compactMessages before each sampling call). - Memory injection + extraction in Codex (parity with Chat/Agent — reads getMemoriesForPrompt into the system prompt + extractMemoriesFromPair after the turn). - CODEX_CATEGORIES tool-scope filter (filesystem/terminal/system/web only; hides image_generate/screenshot/process_list/run_workflow from Codex). - Codex iter cap 20 → 50 (large refactors need more tool calls; budget still caps via agentMaxToolCalls/agentMaxIterations). - Family grouping in ModelSelector dropdown (QWEN / GEMMA / LLAMA / HERMES / PHI / DOLPHIN / MISTRAL / DEEPSEEK / …). E2E verified on 5 tool-capable Ollama models (gemma4:e4b + full CLI task with 4/4 unittest pass, qwen2.5-coder:3b, hermes3:8b, llama3.1:8b, llama3.2:1b). Tests 2202 → 2202 green. cargo + tsc clean. Drop-in upgrade from v2.3.7. No breaking changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5a3046c commit 6ebe353

12 files changed

Lines changed: 543 additions & 55 deletions

File tree

CHANGELOG.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,35 @@
22

33
All notable changes to Locally Uncensored are documented here.
44

5+
## [2.3.8] - 2026-04-22
6+
7+
### Fixed
8+
- **Codex `file_write` now actually lands on disk in the expected folder** — the built-in tool executors (`fs_read`, `fs_write`, `fs_list`, `fs_search`, `shell_execute`, `execute_code`) in `src/api/mcp/builtin-tools.ts` never threaded the active chat-id through to Rust even though `agent-context.ts` was designed for exactly that. The documented per-chat workspace isolation (`~/agent-workspace/<chatId>/`) silently fell through to a shared `default/` fallback whenever the model emitted a relative path, and no per-chat isolation ever happened. Now every executor reads `getActiveChatId()` and spreads it into the `backendCall` payload so Rust's `resolve_path()` / `resolve_agent_path()` can route relative paths into the right per-chat folder. `src/api/agents.ts:executeTool` also now returns the real `data.path` from Rust's `{status:"saved", path:…}` response instead of a hard-coded `"File written successfully"` string that masked write failures behind a green ✓ in the UI.
9+
- **Codex chat bubble no longer floods with raw `{"name":"file_write", "arguments":{…}}` JSON objects for models that emit tool calls as content** — qwen2.5-coder:3b and similar small coder models put the tool call in the `content` field instead of the native `tool_calls` array. The pre-2.3.8 extractor caught the call but left the raw JSON visible in the chat, and the narrative around it ("I'm about to verify…" + ```python fence echoing the file content) was concatenated onto `fullContent` every iteration — a 4-iteration task rendered as four stacked JSON blobs with four duplicated paragraphs. Fix: new `stripRanges()` helper uses the `[startIdx, endIdx]` positions the balanced-brace extractor already computes to remove the exact tool-call substrings (not a greedy regex that fails on nested braces), and an `extractedFromContent` flag drops the residual narrative entirely so qwen's Codex UI now looks identical to gemma4's.
10+
- **Balanced-brace JSON extractor replaces the greedy `\{[^}]*\}` regex** — the old regex failed on any JSON with nested braces OR string values containing `{` (e.g. Python f-strings `f'Hello, {name}!'` emitted by qwen2.5-coder). Replaced with a locate-header-then-balance scanner that respects string escapes. Fixes `extractToolCallsFromContent` for any code that uses f-strings or dict literals in string values.
11+
- **Arg-validator error-hint now lists the exact missing fields with types and what the model actually sent** — pre-2.3.8 the generic "Re-issue the tool call with valid arguments matching the tool schema" hint meant small models (hermes3:8b, qwen2.5-coder:3b) kept retrying the same malformed call. Now the hint looks like `file_write requires {path: string, content: string}. You sent {command}. Retry with all required fields present.` — concrete enough that small models actually self-correct.
12+
13+
### Added
14+
- **Context compaction in Codex** — long multi-tool turns used to blow past 8K-context local models' windows; Codex now mirrors Agent Mode's `compactMessages(…, Math.floor(maxCtx * 0.8))` call before each sampling pass, summarising older turns while keeping recent messages intact.
15+
- **Memory injection + extraction in Codex** — Codex was the only chat surface that ignored the memory system. It now reads `useMemoryStore.getState().getMemoriesForPrompt(instruction, contextTokens)` into the system prompt at dispatch time, and runs `extractMemoriesFromPair()` after the turn lands. Parity with Chat + Agent Mode.
16+
- **`CODEX_CATEGORIES` tool-scope filter** — Codex now filters `toolRegistry.getAll()` to the `filesystem | terminal | system | web` categories before passing tools to the model. The pre-2.3.8 code had the constant defined but never used, so small models were getting confused by `image_generate`, `screenshot`, `run_workflow`, and `process_list` showing up next to `file_write` and emitting tool calls with the wrong argument shape (confirmed repro: hermes3:8b calling `file_write({command: "python -m unittest …"})` when both shell_execute and file_write were in scope). The filter narrows the blade.
17+
- **Codex iter cap raised 20 → 50** — large refactors across 10+ files legitimately need more than 20 tool calls. Budget still caps via `agentMaxToolCalls` / `agentMaxIterations` (defaults 50 / 25 from settings).
18+
- **Family grouping in ModelSelector dropdown** — models are now grouped by family header (QWEN / GEMMA / LLAMA / HERMES / PHI / DOLPHIN / MISTRAL / DEEPSEEK / …) in the Codex/Chat/Code dropdown, with a subscribe effect that re-fetches the list when any provider's `enabled`/`baseUrl` changes so users don't have to open Model Manager to see newly-enabled providers.
19+
20+
### E2E verified
21+
5 tool-capable Ollama models, each in a fresh Codex chat, writing to `C:\Users\<user>\Desktop\<test-folder>\`:
22+
- **gemma4:e4b** — both simple (`file_write hello.py`) and a real Codex-style task ("build cli.py with argparse add/list/clear + test_cli.py with 4 unittest tests + run `python -m unittest test_cli.py` and report") succeeded end-to-end. Full trace: `file_write cli.py (2556B)``file_write test_cli.py (3759B)``shell_execute python -m unittest test_cli.py` → real output `....\nRan 4 tests in 1.612s\nOK` → final summary. 3 clean tool blocks in the UI, single final answer, Memory badge fired on extraction.
23+
- **qwen2.5-coder:3b** — after the `stripRanges` + `extractedFromContent` fix, chat UI is visually identical to gemma4's (tool blocks + single summary, zero raw JSON).
24+
- **hermes3:8b** — clean native tool-call flow.
25+
- **llama3.1:8b** — clean native tool-call flow (freshly pulled for this verification).
26+
- **llama3.2:1b** — plumbing correct; the 1B model hallucinated a Unix-style `/Users/ddrob/Desktop/tiny.py` path that landed at `C:/Users/ddrob/Desktop/tiny.py` on Windows instead of in the workdir. Model-quality artefact, not a Codex bug. Documented for users on the smallest class of models.
27+
28+
### Changed
29+
- Test suite 2202 → 2202 (full regression) after `tool-call-repair` gained `extractToolCallsWithRanges` + `stripRanges` + `findBalancedBraceEnd` + `findPrecedingOpenBrace`.
30+
31+
### Notes
32+
- Drop-in upgrade from v2.3.7. No breaking changes. No localStorage migration. Existing Codex chats continue to work; new chats benefit from the per-chat workspace isolation now that `chatId` threads through.
33+
534
## [2.3.7] - 2026-04-22
635

736
### Added

README.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,15 +35,32 @@ No cloud. No data collection. No API keys. Auto-detects 12 local backends. Your
3535

3636
---
3737

38-
## v2.3.7 — Current Release
38+
## v2.3.8 — Current Release
3939

40-
**Remote Ollama + `OLLAMA_HOST` env var support, 2202 Tests**
40+
**Codex end-to-end overhaul — tool calls that actually land on disk, clean UI across all models, memory + context compaction**
4141

4242
### Critical Fixes (why you want this update)
43-
- **Remote Ollama now actually works** — Issue #31 by @k-wilkinson. Pre-2.3.7 the Ollama endpoint was hardcoded to `localhost:11434` in four places (frontend URL helper, Vite dev proxy, Ollama provider dev-mode path, Rust pull-model command), so setting `OLLAMA_HOST=0.0.0.0:11434`, `192.168.1.x:11434` or any custom port was silently ignored — LU reported "No local backend detected", model dropdowns stayed empty, Settings → Providers → Ollama → Endpoint field had zero effect, and the Test button always said Failed. Fixed end-to-end: a single `ollama_base` field reads, in priority, the persisted GUI value, then the `OLLAMA_HOST` env var (same semantics as Ollama itself), then the default. Vite dev proxy target is computed from `OLLAMA_HOST` at startup, the Rust SSRF allow-list in `proxy_localhost` accepts the configured Ollama + ComfyUI hosts, `pull_model_stream` reads from state.
44-
45-
### What's still in v2.3.7 from v2.3.6
46-
Drop-in upgrade. v2.3.6's configurable ComfyUI host (Shoaib's remote ComfyUI feature), LM Studio / OpenAI-compat CORS fix, and ComfyUI port persistence all remain in place.
43+
- **Codex file_write actually writes to disk now** — pre-2.3.8 Codex's built-in tool executors (`fs_read`, `fs_write`, `fs_list`, `fs_search`, `shell_execute`, `execute_code`) never threaded the active chat-id through to the Rust backend even though the whole `agent-context.ts` plumbing was designed for it. The documented per-chat workspace isolation (`~/agent-workspace/<chatId>/`) silently fell through to a shared `default/` fallback, and relative paths the model emitted landed nowhere useful. Fixed at the frontend layer — every builtin executor now calls `backendCall('fs_write', { …, chatId: getActiveChatId() })`. `agents.ts:executeTool` also returns the real `data.path` from Rust instead of a fake `"File written successfully"` string that hid write failures behind a green ✓.
44+
- **Clean Codex UI for every model, not just the ones that emit native `tool_calls`** — qwen2.5-coder:3b (and other small coder models) emit tool calls as raw JSON in the `content` field instead of the native `tool_calls` array. The pre-2.3.8 extractor caught the JSON but left the raw `{"name":"file_write", "arguments": {...}}` object visible in the chat bubble, plus every iteration's narrative ("I'm about to verify…" + a ```python fence with the file content) was concatenated onto `fullContent` — so a 4-tool-call turn looked like four stacked JSON blobs with four duplicated paragraphs. Fix: new `stripRanges()` helper uses the balanced-brace positions the extractor already computes to remove the exact tool-call JSON substrings (not a greedy regex that fails on f-strings with `{name}`), and an `extractedFromContent` flag drops the residual narrative entirely so qwen's Codex chat now looks identical to gemma4's.
45+
- **Context compaction in Codex** — long multi-tool turns used to blow past the model's context window on 8K-context local models; Codex now summarises older turns via `compactMessages` before every sampling call. Parity with Agent Mode.
46+
- **Memory injection + extraction in Codex** — Codex was the only chat surface that ignored the memory system. It now reads `getMemoriesForPrompt()` at dispatch time and runs `extractMemoriesFromPair()` after the turn lands, so long-running coding sessions accumulate context like everywhere else.
47+
- **Tool-scope filter for Codex** — Codex now filters the registry to `filesystem | terminal | system | web` categories before passing tools to the model. Small models were getting confused by `image_generate`, `screenshot`, `run_workflow`, `process_list` showing up next to `file_write` and emitting tool calls with the wrong argument shape (e.g. `file_write({command: …})`). The filter narrows the blade.
48+
- **Balanced-brace JSON extractor** — the naive `\{[^}]*\}` regex in `tool-call-repair.ts` failed on any nested brace or string value containing `{` (e.g. Python f-strings `f'Hello, {name}!'`). Replaced with a locate-header-then-balance scanner that respects string escapes. Fixes qwen2.5-coder:3b tool-call extraction for any code that uses f-strings.
49+
- **Concrete arg-validator error hints** — when a tool call fails schema validation, the retry message sent back to the model now lists the exact required fields with their types + the keys the model actually sent ("`file_write requires {path: string, content: string}. You sent {command}. Retry with all required fields present.`") instead of the old generic "matching the tool schema" hint. Small models self-correct much better with a concrete example.
50+
- **Codex iter cap raised 20 → 50** — large refactors across 10+ files legitimately need more than 20 tool calls. Budget still caps via `agentMaxToolCalls` / `agentMaxIterations` (defaults 50 / 25 from settings).
51+
52+
### E2E verified on this build
53+
5 tool-capable Ollama models × simple `file_write` task:
54+
| Model | Result | Native tool_calls | UI |
55+
|---|---|---|---|
56+
| gemma4:e4b | ✅ + full CLI task: 4/4 unittest pass | yes | clean |
57+
| qwen2.5-coder:3b || no (extracted from content) | clean (after stripRanges) |
58+
| hermes3:8b || yes | clean |
59+
| llama3.1:8b || yes | clean |
60+
| llama3.2:1b | ✅ (model emitted Unix `/Users/…` path, plumbing correct) | yes | clean |
61+
62+
### What's still in v2.3.8 from v2.3.7
63+
Drop-in upgrade. v2.3.7's remote Ollama + `OLLAMA_HOST` env var support, v2.3.6's configurable ComfyUI host, LM Studio / OpenAI-compat CORS fix, and ComfyUI port persistence all remain in place.
4764

4865
### Remote Access + Mobile Web App
4966
- **Access your AI from your phone** — Dispatch via LAN or Cloudflare Tunnel (Internet)

0 commit comments

Comments
 (0)