Releases: johnzfitch/llmx
v2.2.1
Fixed
- Connection pooling:
pool_idle_timeout(10s)andpool_max_idle_per_host(2)evict stale connections after backend restarts - Request timeout: 30s timeout on all HTTP proxy calls via
tokio::time::timeout - Retry with backoff: Up to 2 retries with 100ms/200ms backoff on transient connection errors
- Token refresh on 401: Re-reads auth token from disk when backend returns 401
- Mutex starvation: Split
refresh_impacted_indexeslock scope into three phases to prevent blocking HTTP handlers during index refresh
These fixes address the recurring issue where cargo install replaces the binary but the running backend holds stale connections, causing the stdio proxy to hang indefinitely.
v2.2.0: Large Codebase Support & CLI Polish
v2.2.0 — Large Codebase Support & CLI Polish
This release removes artificial limits blocking large monorepos and adds quality-of-life improvements across both the CLI and MCP server.
Limits Increased for Monorepos
| Limit | Before | After |
|---|---|---|
| Max file size | 64MB | 256MB |
| Max file count | 10K–100K | 200K |
| Max directory depth | 10 | 20 |
| Max total bytes | 100MB | 500MB |
| Timeout | 30s | 120s |
CLI Improvements
- llmx index
- Now indexes current directory when no path given — no more
llmx index .required - llmx search
- Shows directory status and query examples when run without a query
--indexflag- Shorter alias for
--index-id(old flag still works) - Fuzzy suggestions
- Typos like
llmx serachnow suggestllmx search
MCP Server Configuration
The MCP server now has proper CLI support for easier integration with Claude Code, Codex, and Cursor:
llmx-mcp --help # See all options
llmx-mcp --path /project # Auto-index on startup
llmx-mcp --storage-dir /dir # Custom storage locationExample .mcp.json
{
"mcpServers": {
"llmx": {
"command": "llmx-mcp",
"args": ["--path", "."]
}
}
}Build & CI
- Consolidated Homebrew and AUR publishing into a single job
Full Changelog: v2.1.0...v2.2.0
v2.1.1
Fixes
- manage delete accepts folder path:
deleteaction now resolves index fromloc/path, not just opaqueindex_id - Windows path normalization:
find_by_pathandfind_metadata_by_pathnormalize backslashes before hashing, fixing index lookup on Windows - ManageInput schema description: Clarified that
index_idaccepts an index ID or folder path, and thatjob_statuspasses the job ID via this field - CLI refs recovery examples: Fixed to use
--direction callersinstead of positional argument
v2.1.0
Release v2.1.0
llmx v2.0
llmx v2.0
Important
This release is much larger than the final merge that cut the tag. From v0.1.0 to v2.0.0, llmx moved from a local-first indexing prototype into a multi-surface retrieval system with real browser embeddings, structural code navigation, async MCP indexing, hardened model loading, and clearer export workflows.
- Real semantic search in the browser
- The browser path now uses Burn-powered
mdbr-leaf-irembeddings instead of earlier Arctic-era assumptions and placeholder flows. The shipped runtime model is a browser-friendly INT8Q8Sartifact with WebGPU acceleration and CPU fallback. - Model delivery got hardened
- Model and tokenizer fetches now enforce allowed origins, same-origin redirect checks, SHA-256 verification, size limits, retry/backoff, and IndexedDB caching. This turned semantic search from a demo feature into a shippable one.
- CLI, MCP, and web grew real v2 retrieval surfaces
searchnow spans lexical, semantic, and hybrid strategies. On top of that,symbols,lookup,refs, and richer stats make it possible to move from fuzzy retrieval into exact code navigation.- Async MCP indexing jobs
llmx_indexno longer has to block while indexing large repositories. MCP now returns ajob_idimmediately and exposesjob_statuspolling throughllmx_manage.- Export format split became intentional
- The project now distinguishes between a searchable bundle that includes
index.jsonand a compact agent bundle built aroundllm.md,manifest.llm.tsv, and compact chunk files. - Web UI crossed out of prototype territory
- The browser UI now exposes first-class Search, Symbols, Lookup, Refs, and Stats views, supports index reload, avoids re-ingesting nested
.llmx-*bundles, and no longer hard-caps ingest at 2000 files. - Dynamic search and ingest safety improved materially
- Path safety checks, dangerous-root rejection,
.gitignore-aware walking, byte caps, timeout protection, selective updates, and broader file-type coverage all landed during the v2 line.
Version 2.0 also includes a quieter but important internal shift: structural symbol tables and edge indexes now back caller/callee/import/type-reference traversal, which makes llmx much more useful for actual code understanding than plain chunk retrieval alone.
Highlights
Model and search stack
- Browser embeddings now center on
mdbr-leaf-irwith768-dimensional output. - WASM/browser builds use a quantized INT8
Q8SBurn artifact. - Native builds carry both
f32andq8model artifacts. - Hybrid retrieval now combines BM25 and vector results with configurable fusion.
Retrieval surfaces
- CLI gained first-class
symbols,lookup,refs, and richerstatsoutput. - MCP now exposes v2-oriented structural tools and async indexing jobs.
- Web now mirrors much more of the retrieval model instead of acting like a thin search demo.
Export and reload behavior
- Searchable ZIP export now includes
index.jsonfor reloadable local search. - Compact export remains available for token-efficient agent consumption.
- Browser import/export behavior is clearer and less error-prone.
Security and privacy
- Your code remains local during indexing and search.
- Public model assets are fetched separately and cached locally.
- Runtime fetch hardening now includes integrity and origin checks rather than treating model download as a blind fetch.
Validation
cargo test -p llmx-mcp --features cli --test cli_tests -- --nocapturecargo test -p llmx-mcp --features mcp --test mcp_tests --no-runcargo test -p ingestor-wasm --lib --no-runnode --check /home/zack/dev/llmx/web/app.jsnode --check /home/zack/dev/llmx/web/worker.jsnode --check /home/zack/dev/llmx/web/index-insights.js
v0.1.0: Arctic-Embed-S INT8 Model
Arctic-Embed-S INT8 Quantized Model
Snowflake Arctic-Embed-S optimized for in-browser semantic search with WebAssembly + WebGPU.
📦 Model File
- File:
arctic-embed-s.bin - Size: 32 MB (31.67 MiB)
- Format: Burn binary (INT8 quantized)
- SHA256:
503896ea39a1e93b3134742b383a4c4ed42349fc9390ece39eeae5461f616505
🔍 Model Specifications
| Property | Value |
|---|---|
| Base Model | Snowflake/snowflake-arctic-embed-s |
| Architecture | BERT (12 layers, 384 hidden dim, 12 attention heads) |
| Embedding Dimension | 384 |
| Vocabulary Size | 30,522 tokens |
| Max Sequence Length | 512 tokens |
| Quantization | INT8 Q8S (per-tensor, signed 8-bit) |
| Original Size | 127 MB (FP32 safetensors) |
| Compression Ratio | 4:1 (74% size reduction) |
| Quality | MSE ≤ 0.001 vs FP32 (validated) |
🚀 Usage
In LLMX Project
Set environment variable before building:
export LLMX_EMBEDDING_MODEL_URL="https://github.com/johnzfitch/llmx/releases/download/v0.1.0/arctic-embed-s.bin"
cd web && npm run buildThe browser will download and cache the model automatically on first load (IndexedDB).
Direct Integration
// Fetch and verify
use sha2::{Digest, Sha256};
let bytes = fetch_from_cdn("https://github.com/johnzfitch/llmx/releases/download/v0.1.0/arctic-embed-s.bin").await?;
// Verify integrity
let mut hasher = Sha256::new();
hasher.update(&bytes);
assert_eq!(
format!("{:x}", hasher.finalize()),
"503896ea39a1e93b3134742b383a4c4ed42349fc9390ece39eeae5461f616505"
);
// Load with Burn
use burn::record::{BinBytesRecorder, Recorder};
let recorder = BinBytesRecorder::<FullPrecisionSettings, Vec<u8>>::default();
let model = BertModel::new(&device).load_record(
recorder.load(bytes.to_vec(), &device)?
);🔐 Security
- Integrity: Always verify SHA256 hash before use
- Source: Quantized from official Snowflake HuggingFace checkpoint
- Signing: Model is deterministically reproducible from source safetensors
📊 Performance
| Backend | Inference Time | Memory |
|---|---|---|
| WebGPU | ~50-100ms/query | ~300MB VRAM |
| CPU (WASM) | ~200-400ms/query | ~150MB RAM |
Benchmarked on: M1 Mac, single query, 512-token sequence
🧪 Validation
Model passes comprehensive test suite:
- ✅ Quantization MSE test (threshold: 0.1, actual: <0.001)
- ✅ Backend portability (NdArray vs WGPU MSE ≤ 0.001)
- ✅ Attention head reshape validation
- ✅ Batch inference correctness
📜 License & Attribution
Base Model: Apache 2.0 (Snowflake AI)
Quantized Weights: Same license, derivative work
Citation:
@misc{arctic-embed-2024,
title={Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models},
author={Snowflake AI Research},
year={2024},
url={https://huggingface.co/Snowflake/snowflake-arctic-embed-s}
}🔧 Reproduction
To rebuild this binary from source:
git clone https://github.com/johnzfitch/llmx
cd llmx/ingestor-wasm
cargo build --release
# Binary generated at: models/arctic-embed-s.binQuantization settings (build.rs):
- Calibration: MinMax
- Value: QuantValue::Q8S
- Level: QuantLevel::Tensor (per-tensor scales)
- Param: QuantParam::F32 (float scales)
Note: Due to non-deterministic quantization, rebuilds will produce different SHA256 hashes. The committed binary is the canonical version for this release.
📞 Support
- Issues: GitHub Issues
- Model Questions: Reference upstream Snowflake model
- Integration Help: See USAGE.md
Generated by: LLMX Phase 7 build system
Build Commit: 4b1bfd5
Compiler: rustc + Burn 0.20