28 Mar 10:56

c956010

v2.2.1 Latest

Latest

Fixed

Connection pooling: pool_idle_timeout(10s) and pool_max_idle_per_host(2) evict stale connections after backend restarts
Request timeout: 30s timeout on all HTTP proxy calls via tokio::time::timeout
Retry with backoff: Up to 2 retries with 100ms/200ms backoff on transient connection errors
Token refresh on 401: Re-reads auth token from disk when backend returns 401
Mutex starvation: Split refresh_impacted_indexes lock scope into three phases to prevent blocking HTTP handlers during index refresh

These fixes address the recurring issue where cargo install replaces the binary but the running backend holds stale connections, causing the stdio proxy to hang indefinitely.

Assets 10

22 Mar 07:40

johnzfitch

v2.2.0

71ef4ad

v2.2.0: Large Codebase Support & CLI Polish

v2.2.0 — Large Codebase Support & CLI Polish

This release removes artificial limits blocking large monorepos and adds quality-of-life improvements across both the CLI and MCP server.

Limits Increased for Monorepos

Limit	Before	After
Max file size	64MB	256MB
Max file count	10K–100K	200K
Max directory depth	10	20
Max total bytes	100MB	500MB
Timeout	30s	120s

CLI Improvements

llmx index: Now indexes current directory when no path given — no more llmx index . required
llmx search: Shows directory status and query examples when run without a query
--index flag: Shorter alias for --index-id (old flag still works)
Fuzzy suggestions: Typos like llmx serach now suggest llmx search

MCP Server Configuration

The MCP server now has proper CLI support for easier integration with Claude Code, Codex, and Cursor:

llmx-mcp --help              # See all options
llmx-mcp --path /project     # Auto-index on startup
llmx-mcp --storage-dir /dir  # Custom storage location

Example .mcp.json

{
  "mcpServers": {
    "llmx": {
      "command": "llmx-mcp",
      "args": ["--path", "."]
    }
  }
}

Build & CI

Consolidated Homebrew and AUR publishing into a single job

Full Changelog: v2.1.0...v2.2.0

Assets 10

22 Mar 21:49

johnzfitch

v2.1.1

0b5aff3

v2.1.1

Fixes

manage delete accepts folder path: delete action now resolves index from loc/path, not just opaque index_id
Windows path normalization: find_by_path and find_metadata_by_path normalize backslashes before hashing, fixing index lookup on Windows
ManageInput schema description: Clarified that index_id accepts an index ID or folder path, and that job_status passes the job ID via this field
CLI refs recovery examples: Fixed to use --direction callers instead of positional argument

Assets 10

21 Mar 03:46

github-actions

v2.1.0

c78ef74

v2.1.0

Release v2.1.0

Assets 10

12 Mar 11:08

johnzfitch

v2.0.0

846a570

llmx v2.0

Important

This release is much larger than the final merge that cut the tag. From v0.1.0 to v2.0.0, llmx moved from a local-first indexing prototype into a multi-surface retrieval system with real browser embeddings, structural code navigation, async MCP indexing, hardened model loading, and clearer export workflows.

Real semantic search in the browser: The browser path now uses Burn-powered mdbr-leaf-ir embeddings instead of earlier Arctic-era assumptions and placeholder flows. The shipped runtime model is a browser-friendly INT8 Q8S artifact with WebGPU acceleration and CPU fallback.
Model delivery got hardened: Model and tokenizer fetches now enforce allowed origins, same-origin redirect checks, SHA-256 verification, size limits, retry/backoff, and IndexedDB caching. This turned semantic search from a demo feature into a shippable one.
CLI, MCP, and web grew real v2 retrieval surfaces: search now spans lexical, semantic, and hybrid strategies. On top of that, symbols, lookup, refs, and richer stats make it possible to move from fuzzy retrieval into exact code navigation.
Async MCP indexing jobs: llmx_index no longer has to block while indexing large repositories. MCP now returns a job_id immediately and exposes job_status polling through llmx_manage.
Export format split became intentional: The project now distinguishes between a searchable bundle that includes index.json and a compact agent bundle built around llm.md, manifest.llm.tsv, and compact chunk files.
Web UI crossed out of prototype territory: The browser UI now exposes first-class Search, Symbols, Lookup, Refs, and Stats views, supports index reload, avoids re-ingesting nested .llmx-* bundles, and no longer hard-caps ingest at 2000 files.
Dynamic search and ingest safety improved materially: Path safety checks, dangerous-root rejection, .gitignore-aware walking, byte caps, timeout protection, selective updates, and broader file-type coverage all landed during the v2 line.

Version 2.0 also includes a quieter but important internal shift: structural symbol tables and edge indexes now back caller/callee/import/type-reference traversal, which makes llmx much more useful for actual code understanding than plain chunk retrieval alone.

Highlights

Model and search stack

Browser embeddings now center on mdbr-leaf-ir with 768-dimensional output.
WASM/browser builds use a quantized INT8 Q8S Burn artifact.
Native builds carry both f32 and q8 model artifacts.
Hybrid retrieval now combines BM25 and vector results with configurable fusion.

Retrieval surfaces

CLI gained first-class symbols, lookup, refs, and richer stats output.
MCP now exposes v2-oriented structural tools and async indexing jobs.
Web now mirrors much more of the retrieval model instead of acting like a thin search demo.

Export and reload behavior

Searchable ZIP export now includes index.json for reloadable local search.
Compact export remains available for token-efficient agent consumption.
Browser import/export behavior is clearer and less error-prone.

Security and privacy

Your code remains local during indexing and search.
Public model assets are fetched separately and cached locally.
Runtime fetch hardening now includes integrity and origin checks rather than treating model download as a blind fetch.

Validation

cargo test -p llmx-mcp --features cli --test cli_tests -- --nocapture
cargo test -p llmx-mcp --features mcp --test mcp_tests --no-run
cargo test -p ingestor-wasm --lib --no-run
node --check /home/zack/dev/llmx/web/app.js
node --check /home/zack/dev/llmx/web/worker.js
node --check /home/zack/dev/llmx/web/index-insights.js

Assets 2

18 Jan 05:09

johnzfitch

v0.1.0

b5043da

v0.1.0: Arctic-Embed-S INT8 Model

Arctic-Embed-S INT8 Quantized Model

Snowflake Arctic-Embed-S optimized for in-browser semantic search with WebAssembly + WebGPU.

📦 Model File

File: arctic-embed-s.bin
Size: 32 MB (31.67 MiB)
Format: Burn binary (INT8 quantized)
SHA256: 503896ea39a1e93b3134742b383a4c4ed42349fc9390ece39eeae5461f616505

🔍 Model Specifications

Property	Value
Base Model	Snowflake/snowflake-arctic-embed-s
Architecture	BERT (12 layers, 384 hidden dim, 12 attention heads)
Embedding Dimension	384
Vocabulary Size	30,522 tokens
Max Sequence Length	512 tokens
Quantization	INT8 Q8S (per-tensor, signed 8-bit)
Original Size	127 MB (FP32 safetensors)
Compression Ratio	4:1 (74% size reduction)
Quality	MSE ≤ 0.001 vs FP32 (validated)

🚀 Usage

In LLMX Project

Set environment variable before building:

export LLMX_EMBEDDING_MODEL_URL="https://github.com/johnzfitch/llmx/releases/download/v0.1.0/arctic-embed-s.bin"
cd web && npm run build

The browser will download and cache the model automatically on first load (IndexedDB).

Direct Integration

// Fetch and verify
use sha2::{Digest, Sha256};

let bytes = fetch_from_cdn("https://github.com/johnzfitch/llmx/releases/download/v0.1.0/arctic-embed-s.bin").await?;

// Verify integrity
let mut hasher = Sha256::new();
hasher.update(&bytes);
assert_eq!(
    format!("{:x}", hasher.finalize()),
    "503896ea39a1e93b3134742b383a4c4ed42349fc9390ece39eeae5461f616505"
);

// Load with Burn
use burn::record::{BinBytesRecorder, Recorder};
let recorder = BinBytesRecorder::<FullPrecisionSettings, Vec<u8>>::default();
let model = BertModel::new(&device).load_record(
    recorder.load(bytes.to_vec(), &device)?
);

🔐 Security

Integrity: Always verify SHA256 hash before use
Source: Quantized from official Snowflake HuggingFace checkpoint
Signing: Model is deterministically reproducible from source safetensors

📊 Performance

Backend	Inference Time	Memory
WebGPU	~50-100ms/query	~300MB VRAM
CPU (WASM)	~200-400ms/query	~150MB RAM

Benchmarked on: M1 Mac, single query, 512-token sequence

🧪 Validation

Model passes comprehensive test suite:

✅ Quantization MSE test (threshold: 0.1, actual: <0.001)
✅ Backend portability (NdArray vs WGPU MSE ≤ 0.001)
✅ Attention head reshape validation
✅ Batch inference correctness

📜 License & Attribution

Base Model: Apache 2.0 (Snowflake AI)
Quantized Weights: Same license, derivative work

Citation:

@misc{arctic-embed-2024,
  title={Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models},
  author={Snowflake AI Research},
  year={2024},
  url={https://huggingface.co/Snowflake/snowflake-arctic-embed-s}
}

🔧 Reproduction

To rebuild this binary from source:

git clone https://github.com/johnzfitch/llmx
cd llmx/ingestor-wasm
cargo build --release
# Binary generated at: models/arctic-embed-s.bin

Quantization settings (build.rs):

Calibration: MinMax
Value: QuantValue::Q8S
Level: QuantLevel::Tensor (per-tensor scales)
Param: QuantParam::F32 (float scales)

Note: Due to non-deterministic quantization, rebuilds will produce different SHA256 hashes. The committed binary is the canonical version for this release.

📞 Support

Issues: GitHub Issues
Model Questions: Reference upstream Snowflake model
Integration Help: See USAGE.md

Generated by: LLMX Phase 7 build system
Build Commit: 4b1bfd5
Compiler: rustc + Burn 0.20

Assets 3

Releases: johnzfitch/llmx

v2.2.1

Fixed

Uh oh!

v2.2.0: Large Codebase Support & CLI Polish

v2.2.0 — Large Codebase Support & CLI Polish

Limits Increased for Monorepos

CLI Improvements

MCP Server Configuration

Build & CI

Uh oh!

v2.1.1

Fixes

Uh oh!

v2.1.0

Uh oh!

llmx v2.0

llmx v2.0

Highlights

Model and search stack

Retrieval surfaces

Export and reload behavior

Security and privacy

Uh oh!

v0.1.0: Arctic-Embed-S INT8 Model

Arctic-Embed-S INT8 Quantized Model

📦 Model File

🔍 Model Specifications

🚀 Usage

In LLMX Project

Direct Integration

🔐 Security

📊 Performance

🧪 Validation

📜 License & Attribution

🔧 Reproduction

📞 Support

Uh oh!