Skip to content

Releases: johnzfitch/llmx

v2.2.1

28 Mar 10:56
c956010

Choose a tag to compare

Fixed

  • Connection pooling: pool_idle_timeout(10s) and pool_max_idle_per_host(2) evict stale connections after backend restarts
  • Request timeout: 30s timeout on all HTTP proxy calls via tokio::time::timeout
  • Retry with backoff: Up to 2 retries with 100ms/200ms backoff on transient connection errors
  • Token refresh on 401: Re-reads auth token from disk when backend returns 401
  • Mutex starvation: Split refresh_impacted_indexes lock scope into three phases to prevent blocking HTTP handlers during index refresh

These fixes address the recurring issue where cargo install replaces the binary but the running backend holds stale connections, causing the stdio proxy to hang indefinitely.

v2.2.0: Large Codebase Support & CLI Polish

22 Mar 07:40
71ef4ad

Choose a tag to compare

v2.2.0 — Large Codebase Support & CLI Polish

This release removes artificial limits blocking large monorepos and adds quality-of-life improvements across both the CLI and MCP server.


Limits Increased for Monorepos

LimitBeforeAfter
Max file size64MB256MB
Max file count10K–100K200K
Max directory depth1020
Max total bytes100MB500MB
Timeout30s120s

CLI Improvements

llmx index
Now indexes current directory when no path given — no more llmx index . required
llmx search
Shows directory status and query examples when run without a query
--index flag
Shorter alias for --index-id (old flag still works)
Fuzzy suggestions
Typos like llmx serach now suggest llmx search

MCP Server Configuration

The MCP server now has proper CLI support for easier integration with Claude Code, Codex, and Cursor:

llmx-mcp --help              # See all options
llmx-mcp --path /project     # Auto-index on startup
llmx-mcp --storage-dir /dir  # Custom storage location
Example .mcp.json
{
  "mcpServers": {
    "llmx": {
      "command": "llmx-mcp",
      "args": ["--path", "."]
    }
  }
}

Build & CI

  • Consolidated Homebrew and AUR publishing into a single job

Full Changelog: v2.1.0...v2.2.0

v2.1.1

22 Mar 21:49
v2.1.1
0b5aff3

Choose a tag to compare

Fixes

  • manage delete accepts folder path: delete action now resolves index from loc/path, not just opaque index_id
  • Windows path normalization: find_by_path and find_metadata_by_path normalize backslashes before hashing, fixing index lookup on Windows
  • ManageInput schema description: Clarified that index_id accepts an index ID or folder path, and that job_status passes the job ID via this field
  • CLI refs recovery examples: Fixed to use --direction callers instead of positional argument

v2.1.0

21 Mar 03:46
v2.1.0
c78ef74

Choose a tag to compare

Release v2.1.0

llmx v2.0

12 Mar 11:08
846a570

Choose a tag to compare

llmx v2.0

Important

This release is much larger than the final merge that cut the tag. From v0.1.0 to v2.0.0, llmx moved from a local-first indexing prototype into a multi-surface retrieval system with real browser embeddings, structural code navigation, async MCP indexing, hardened model loading, and clearer export workflows.

Real semantic search in the browser
The browser path now uses Burn-powered mdbr-leaf-ir embeddings instead of earlier Arctic-era assumptions and placeholder flows. The shipped runtime model is a browser-friendly INT8 Q8S artifact with WebGPU acceleration and CPU fallback.
Model delivery got hardened
Model and tokenizer fetches now enforce allowed origins, same-origin redirect checks, SHA-256 verification, size limits, retry/backoff, and IndexedDB caching. This turned semantic search from a demo feature into a shippable one.
CLI, MCP, and web grew real v2 retrieval surfaces
search now spans lexical, semantic, and hybrid strategies. On top of that, symbols, lookup, refs, and richer stats make it possible to move from fuzzy retrieval into exact code navigation.
Async MCP indexing jobs
llmx_index no longer has to block while indexing large repositories. MCP now returns a job_id immediately and exposes job_status polling through llmx_manage.
Export format split became intentional
The project now distinguishes between a searchable bundle that includes index.json and a compact agent bundle built around llm.md, manifest.llm.tsv, and compact chunk files.
Web UI crossed out of prototype territory
The browser UI now exposes first-class Search, Symbols, Lookup, Refs, and Stats views, supports index reload, avoids re-ingesting nested .llmx-* bundles, and no longer hard-caps ingest at 2000 files.
Dynamic search and ingest safety improved materially
Path safety checks, dangerous-root rejection, .gitignore-aware walking, byte caps, timeout protection, selective updates, and broader file-type coverage all landed during the v2 line.

Version 2.0 also includes a quieter but important internal shift: structural symbol tables and edge indexes now back caller/callee/import/type-reference traversal, which makes llmx much more useful for actual code understanding than plain chunk retrieval alone.

Highlights

Model and search stack

  • Browser embeddings now center on mdbr-leaf-ir with 768-dimensional output.
  • WASM/browser builds use a quantized INT8 Q8S Burn artifact.
  • Native builds carry both f32 and q8 model artifacts.
  • Hybrid retrieval now combines BM25 and vector results with configurable fusion.

Retrieval surfaces

  • CLI gained first-class symbols, lookup, refs, and richer stats output.
  • MCP now exposes v2-oriented structural tools and async indexing jobs.
  • Web now mirrors much more of the retrieval model instead of acting like a thin search demo.

Export and reload behavior

  • Searchable ZIP export now includes index.json for reloadable local search.
  • Compact export remains available for token-efficient agent consumption.
  • Browser import/export behavior is clearer and less error-prone.

Security and privacy

  • Your code remains local during indexing and search.
  • Public model assets are fetched separately and cached locally.
  • Runtime fetch hardening now includes integrity and origin checks rather than treating model download as a blind fetch.
Validation
  • cargo test -p llmx-mcp --features cli --test cli_tests -- --nocapture
  • cargo test -p llmx-mcp --features mcp --test mcp_tests --no-run
  • cargo test -p ingestor-wasm --lib --no-run
  • node --check /home/zack/dev/llmx/web/app.js
  • node --check /home/zack/dev/llmx/web/worker.js
  • node --check /home/zack/dev/llmx/web/index-insights.js

v0.1.0: Arctic-Embed-S INT8 Model

18 Jan 05:09
b5043da

Choose a tag to compare

Arctic-Embed-S INT8 Quantized Model

Snowflake Arctic-Embed-S optimized for in-browser semantic search with WebAssembly + WebGPU.

📦 Model File

  • File: arctic-embed-s.bin
  • Size: 32 MB (31.67 MiB)
  • Format: Burn binary (INT8 quantized)
  • SHA256: 503896ea39a1e93b3134742b383a4c4ed42349fc9390ece39eeae5461f616505

🔍 Model Specifications

Property Value
Base Model Snowflake/snowflake-arctic-embed-s
Architecture BERT (12 layers, 384 hidden dim, 12 attention heads)
Embedding Dimension 384
Vocabulary Size 30,522 tokens
Max Sequence Length 512 tokens
Quantization INT8 Q8S (per-tensor, signed 8-bit)
Original Size 127 MB (FP32 safetensors)
Compression Ratio 4:1 (74% size reduction)
Quality MSE ≤ 0.001 vs FP32 (validated)

🚀 Usage

In LLMX Project

Set environment variable before building:

export LLMX_EMBEDDING_MODEL_URL="https://github.com/johnzfitch/llmx/releases/download/v0.1.0/arctic-embed-s.bin"
cd web && npm run build

The browser will download and cache the model automatically on first load (IndexedDB).

Direct Integration

// Fetch and verify
use sha2::{Digest, Sha256};

let bytes = fetch_from_cdn("https://github.com/johnzfitch/llmx/releases/download/v0.1.0/arctic-embed-s.bin").await?;

// Verify integrity
let mut hasher = Sha256::new();
hasher.update(&bytes);
assert_eq!(
    format!("{:x}", hasher.finalize()),
    "503896ea39a1e93b3134742b383a4c4ed42349fc9390ece39eeae5461f616505"
);

// Load with Burn
use burn::record::{BinBytesRecorder, Recorder};
let recorder = BinBytesRecorder::<FullPrecisionSettings, Vec<u8>>::default();
let model = BertModel::new(&device).load_record(
    recorder.load(bytes.to_vec(), &device)?
);

🔐 Security

  • Integrity: Always verify SHA256 hash before use
  • Source: Quantized from official Snowflake HuggingFace checkpoint
  • Signing: Model is deterministically reproducible from source safetensors

📊 Performance

Backend Inference Time Memory
WebGPU ~50-100ms/query ~300MB VRAM
CPU (WASM) ~200-400ms/query ~150MB RAM

Benchmarked on: M1 Mac, single query, 512-token sequence

🧪 Validation

Model passes comprehensive test suite:

  • ✅ Quantization MSE test (threshold: 0.1, actual: <0.001)
  • ✅ Backend portability (NdArray vs WGPU MSE ≤ 0.001)
  • ✅ Attention head reshape validation
  • ✅ Batch inference correctness

📜 License & Attribution

Base Model: Apache 2.0 (Snowflake AI)
Quantized Weights: Same license, derivative work

Citation:

@misc{arctic-embed-2024,
  title={Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models},
  author={Snowflake AI Research},
  year={2024},
  url={https://huggingface.co/Snowflake/snowflake-arctic-embed-s}
}

🔧 Reproduction

To rebuild this binary from source:

git clone https://github.com/johnzfitch/llmx
cd llmx/ingestor-wasm
cargo build --release
# Binary generated at: models/arctic-embed-s.bin

Quantization settings (build.rs):

  • Calibration: MinMax
  • Value: QuantValue::Q8S
  • Level: QuantLevel::Tensor (per-tensor scales)
  • Param: QuantParam::F32 (float scales)

Note: Due to non-deterministic quantization, rebuilds will produce different SHA256 hashes. The committed binary is the canonical version for this release.

📞 Support

  • Issues: GitHub Issues
  • Model Questions: Reference upstream Snowflake model
  • Integration Help: See USAGE.md

Generated by: LLMX Phase 7 build system
Build Commit: 4b1bfd5
Compiler: rustc + Burn 0.20