Semantic file search CLI — built for AI agents driving a shell.
MFS stands for both Memory File Search and Milvus File Search.
It's a POSIX-style CLI (ls / tree / cat / grep / search) that gives an AI
agent semantic access to any folder — and gives a human the same toolkit at the
terminal. Files are the source of truth; Milvus is the derived index underneath.
The "memory" part is not just branding. Modern agent memory is often discussed as semantic, episodic, procedural, and working memory; in practice, the durable layers are usually files: Markdown notes, JSONL transcripts, SKILL documents, runbooks, PDFs, DOCX files, and code. MFS makes those memory files searchable with Milvus. See the docs homepage for the full story.
MFS is a middle layer: below it, Milvus / Zilliz Cloud is abstracted away; above it, any agent application that has folders full of files — memory logs, skill definitions, session transcripts, source code — can plug in without touching a vector database.
┌────────────────────────────────────────────────────────────────────┐
│ Agent Applications │
│ ───────────────────────────────────────────────────────── │
│ memory systems skill managers codebase copilots │
│ (daily .md logs) (trees of SKILL.md) (repo-aware chat) │
│ session replayers knowledge bases …your next agent app │
│ (session .jsonl) (docs, PDFs) │
└────────────────────────────────┬───────────────────────────────────┘
│ invokes CLI / Skill
▼
┌────────────────────────────────────────────────────────────────────┐
│ MFS ← you are here │
│ ───────────────────── │
│ 📟 CLI mfs add · search · grep · ls · tree · cat │
│ 🧠 Skill skills/mfs (reusable agent workflow instructions) │
│ │
│ Hybrid retrieval · density presets · JSON Hit envelope · │
│ tree-sitter AST · pipe-aware · ~/.mfs/ state only │
└────────────────────────────────┬───────────────────────────────────┘
│ wraps / abstracts
▼
┌────────────────────────────────────────────────────────────────────┐
│ Milvus (Lite · Self-hosted · Zilliz Cloud) │
│ ──────────────────────────────────────────── │
│ dense vectors · BM25 sparse · RRF fusion · metadata filters │
└────────────────────────────────────────────────────────────────────┘
Because agents already speak shell. An LLM agent can plan
mfs tree --peek → mfs cat --skim → mfs search "..." with zero integration
code — the same way a human developer would. No client library to version, no
service to keep alive, no schema to import; just a binary on $PATH and a
--json flag when the caller is a machine.
MFS ships the two things an agent-first tool actually needs:
- 📟 a CLI — for any agent that can run shell commands (Claude Code, Codex CLI, OpenCode, your own)
- 🧠 a companion Skill —
skills/mfs, a reusable agent skill that teaches when to search, when to browse, when to verify with line ranges, and when to use native shell tools instead
Closing the loop: MFS is the tool agents use to build agent apps. The same CLI that powers a memory system or a skill manager is also what you hand to your own agent while you're building it.
- 🤖 Shell-native, agent-first. The commands an agent already knows (
ls,cat,grep) — now semantically aware. Every command has a--jsonmode with a unifiedHitenvelope so an agent can parse without regex. - 🔎 Hybrid retrieval. Dense vectors for meaning, BM25 for exact tokens, RRF for fusion. Short and long queries both work.
- 📏 Progressive browsing.
--peek/--skim/--deeponls/tree/catshare one density model — orient an agent in a new repo without burning its context window. - 🧩 Multi-format indexing. Markdown, source code (tree-sitter AST across 15
languages), PDF (via pymupdf4llm), and DOCX (via python-docx). JSON, JSONL,
CSV, HTML and friends stay grep-able and readable via
mfs cat. - 🔀 Smart grep routing. Indexed files go through Milvus BM25; everything else
falls back to the system
grep— you don't think about which is which. - 🚫 No LLM in the hot path. Chunking, summarization heuristics, embedding — all run without calling any LLM. LLM / VLM enrichment is strictly opt-in.
- 🧼 Zero intrusion. All state lives in
~/.mfs/. Your project directory gets nothing added to it.
See docs/skill.md to install the companion MFS agent skill for Codex, Claude Code, or another shell-based agent.
We tested MFS in end-to-end agent runs on large code and documentation corpora. The strongest result came from using both MFS search and MFS browse: search finds better candidates, and browse helps the agent compare them without reading whole files.
See evaluation for the full setup, metrics, examples, and machine-readable artifacts.
Install from PyPI. The package name is mfs-cli; the command is mfs:
uv tool install mfs-cli
mfs --helpFor one-off use without installing a persistent tool:
uvx --from mfs-cli mfs --helpTo use optional providers, install the matching extra:
uv tool install "mfs-cli[onnx]" # local bge-m3 ONNX INT8
uv tool install "mfs-cli[google]" # Google Gemini embeddings
uv tool install "mfs-cli[llm-anthropic]" # Anthropic summaries/descriptionsFor development from source:
git clone https://github.com/zilliztech/mfs.git
cd mfs
uv sync # base install (OpenAI embedding ready)
uv run mfs --helpMFS is managed with uv and pyproject.toml.
Source install extras — other embedding / LLM providers
# Embedding providers
uv sync --extra onnx # local bge-m3 ONNX INT8, no API key
uv sync --extra google # Google Gemini embeddings
uv sync --extra voyage # Voyage AI
uv sync --extra jina # Jina
uv sync --extra ollama # local Ollama models
uv sync --extra mistral # Mistral
uv sync --extra local # sentence-transformers (GPU)
# LLM / VLM providers (for --summarize / --describe)
uv sync --extra llm-anthropic # Claude
uv sync --extra llm-google # Gemini
uv sync --extra llm-ollama # local Ollama
uv sync --extra llm-mistral # Mistral
# Everything at once
uv sync --extra allEnvironment variables (only the ones you actually use):
export OPENAI_API_KEY="sk-..." # default provider
export GOOGLE_API_KEY="..." # or GEMINI_API_KEY
export ANTHROPIC_API_KEY="..."
export VOYAGE_API_KEY="..."
export JINA_API_KEY="..."
export MISTRAL_API_KEY="..."# 1. Index the current directory (incremental — re-runs are cheap)
$ mfs add .
Processing 184 files under /repo
Indexed: 184 files scanned, 184 touched, 0 deleted, 2341 chunks queued.
Worker running in background. Run `mfs status` to check progress.
# 2. Semantic search — pass a positional <path> scope (POSIX-style, like grep)
# or --all to search every indexed folder.
# Header is [N] <path> score=…, with line numbers in the left gutter of
# each body line (ripgrep-style).
$ mfs search "how do we handle token expiration" .
[1] src/auth/token.py score=0.890
142 def refresh_token(user_id: str, refresh_jwt: str) -> Token:
143 """Exchange a refresh token for a new access token.
144
145 Raises TokenExpiredError if the refresh token is past its TTL —
146 the caller should redirect to login.
147 """
148 ...
[2] docs/auth.md score=0.710
24 ## Token expiration
25
26 Access tokens live 15 minutes; refresh tokens live 14 days.
# 3. Exact-match — Milvus BM25 for indexed files, system grep for the rest
$ mfs grep "ERR_TOKEN_EXPIRED" .
src/auth/token.py
167 raise TokenExpiredError("ERR_TOKEN_EXPIRED")
# 4. Orient yourself in an unfamiliar folder — cheap peek
$ mfs tree --peek -L 2 ./docs/
# 5. Check indexing status
$ mfs statusMFS gives an agent two complementary command families:
- 🔎 Search — flat retrieval over the whole corpus (dense + BM25 + RRF)
- 📖 Browse — walk along the natural hierarchy of files and folders (headings, symbols, directory trees), paying only a few tokens to see what's there
Search finds candidates in a sea of text. Browse lets the agent look around between those candidates without reading whole files. Two legs — neither works as well on its own.
Flat, corpus-wide retrieval. mfs search is hybrid (dense + BM25 + RRF);
mfs grep is exact-match (Milvus BM25 for indexed files, falls back to the
system grep for everything else).
Both commands take a positional <path> scope — POSIX-style, like
grep pattern path. Pass --all to search every indexed folder; without a
<path> or --all, the command errors rather than silently defaulting to the
whole index.
mfs search "how do we handle token expiration" . # hybrid, scope = cwd
mfs search "oauth flow" ./src/ --mode semantic # dense only
mfs search "ERR_TOKEN" ./src/ --mode keyword # BM25 only
mfs search "auth" --all --top-k 20 # across all indexed folders
mfs search "auth" ./src/ --json # scoped + JSON for parsing
git log --oneline | mfs search "fix auth" # temporary dense search over stdin
mfs grep "ERR_TOKEN_EXPIRED" . # Milvus BM25 / system grep
mfs grep -C 5 "OAuth" ./docs/ # context lines
mfs grep "def.*token" ./src/ # regexFiles and directories come with natural structure — Markdown has headings, source code has classes and functions, a directory has children and summaries. MFS exposes that structure at three density presets, with the same mental model on every file type:
| Preset | Cost | What an agent sees | Answers |
|---|---|---|---|
--peek |
cheapest | structure only — headings, symbols, file names | what is this thing? |
--skim |
medium | structure + a short paragraph per node | what is each section about? |
--deep |
highest | full expansion down the outline | I'm about to edit this; show me |
# directories
mfs tree --peek -L 2 ./docs/ # skeleton, two levels deep
mfs ls --skim ./docs/ # every file with a one-paragraph summary
mfs tree --deep ./docs/ # full expansion
# single files
mfs cat --peek ./docs/auth.md # heading-only skeleton
mfs cat --skim ./docs/auth.md # headings + first paragraph of each
mfs cat --deep ./docs/auth.md # detailed expansion
mfs cat -n 40:60 ./docs/auth.md # drill in to a specific line range
mfs cat --skim ./data/events.jsonl # compact rows for JSONL / JSON / CSVAll three presets are driven by the same three knobs — -W <chars> (per-node
width), -H <n> (how many top-level nodes), -D <n> (depth). Custom budgets
work anywhere: mfs cat -W 80 -H 5 -D 2 auth.md.
PDF and DOCX files are converted to Markdown before indexing or browsing.
JSON, JSONL and CSV are intentionally not embedded by default; mfs grep
still searches them, and mfs cat --peek/--skim/--deep gives compact
structured views.
The point of browse: an agent should not have to choose between "read the
whole file" (expensive) and "stare at a single search chunk" (no surrounding
context). Spending a few hundred tokens on --peek of a whole directory is
enough to know what lives there — and cheap enough to catch things search
might have missed.
Search is flat; browse is hierarchical. A typical agent pass alternates them:
# 1. orient — peek the whole repo into a few hundred tokens
mfs tree --peek -L 2 .
# 2. locate — flat hybrid search (scope = cwd, or --all for everything)
mfs search "how is session state stored" . --top-k 5
# 3. contextualize — skim the candidate file so the hit has surroundings
mfs cat --skim ./src/session/store.py
# 4. drill in — read the exact lines before editing
mfs cat -n 80:140 ./src/session/store.pyBrowse doubles as a cheap safety net for search: a --peek over a
neighbouring directory often surfaces a relevant file that didn't match the
search query's wording.
Every command (search, grep, ls, tree, cat) accepts --json and emits
the same Hit envelope — {source, lines, content, score, metadata}.
metadata.kind tells the caller which command produced the hit, so one parser
handles all five.
$ mfs search "oauth flow" . --json
[
{
"source": "/repo/src/auth/oauth.py",
"lines": [42, 98],
"content": "class OAuthClient:\n ...",
"score": 0.87,
"metadata": {
"kind": "search",
"content_type": "code",
"is_dir": false,
"chunk_index": 3,
"language": "python",
"symbol_name": "OAuthClient"
}
}
]Both are opt-in and off by default — MFS's chunking and embedding pipeline never calls an LLM unless you ask it to. Flip them on when vague queries miss the right files, or when you want image assets to be searchable.
Text files (Markdown, code, PDFs/DOCX converted to Markdown) can carry an
auto-generated LLM summary that's embedded alongside the body chunks in
the same collection. The summary participates in the same hybrid retrieval,
so a vague query like "how does the new onboarding flow work" can hit the
summary even when no single body chunk matches the wording. When a summary
wins, the result's header picks up a [summary] marker so the caller can
tell it apart from body chunks.
mfs add ./docs/ --summarize # auto-generate via the configured [llm]Summaries are stale-tracked: when a summarized file is re-indexed after
edits, its summary is marked stale but kept around until you regenerate it.
mfs status --needs-summary lists what still needs a fresh pass.
There is no direct image embedding (no CLIP-style multimodal encoder).
Instead the path is image → VLM text description → text embedder, so the
image shows up as a normal search hit with content_type: vlm_description in
the JSON envelope. Works for PNG / JPG / WEBP / GIF / BMP.
mfs add ./assets/ --describe # auto-generate via a VLM-capable provider| Role | Providers implemented |
|---|---|
| Text summaries | openai, anthropic, google, ollama, mistral |
| VLM image descriptions | openai (gpt-4o / gpt-4o-mini / gpt-4-turbo), anthropic, google |
Install with uv sync --extra llm-<name> and configure in ~/.mfs/config.toml:
[llm]
provider = "openai"
model = "gpt-4o-mini" # must be a vision model if you use --describePointing --describe at a text-only provider (ollama / mistral) exits with an
error rather than silently skipping the image.
┌──────────────────────────────────────────────────────────────┐
│ mfs <cmd> │
│ add · search · grep · ls · tree · cat · status · remove │
└────────────────┬───────────────────────────┬─────────────────┘
│ │
┌────────┴────────┐ ┌────────┴────────┐
│ Ingest │ │ Retrieve │
│ │ │ │
│ scan │ │ hybrid search │
│ ↓ │ │ (dense + BM25 │
│ chunk │ │ + RRF) │
│ (tree-sitter, │ │ ↓ │
│ markdown, │ │ density render │
│ pymupdf4llm) │ │ (peek/skim/ │
│ ↓ │ │ deep) │
│ embed │ │ │
└────────┬────────┘ └────────┬────────┘
│ │
▼ ▲
┌────────────────────────────────────────────────┐
│ Milvus (Lite · Self-hosted · Zilliz Cloud) │
│ dense vectors · BM25 sparse · metadata │
└────────────────────────────────────────────────┘
▲
│ derived index
│
┌────────────────────────────────────────────────┐
│ Your files (source of truth; read-only) │
│ state → ~/.mfs/ only; project dir untouched │
└────────────────────────────────────────────────┘
Incremental sync is automatic: each mfs add . hashes files and only re-embeds
what changed. The Milvus collection is a rebuildable cache — delete ~/.mfs/
and a fresh mfs add . reconstructs everything from the files on disk.
Config lives at ~/.mfs/config.toml. Nothing is required — defaults pick OpenAI
embeddings and Milvus Lite. Minimal example:
[embedding]
provider = "openai" # openai | onnx | google | voyage | jina | mistral | ollama | local
model = "text-embedding-3-small"
[llm]
provider = "openai"
model = "gpt-4o-mini"
[milvus]
# uri = "~/.mfs/milvus.db" # default: Milvus Lite, embedded
# uri = "http://localhost:19530" # self-hosted Milvus
# uri = "https://xxx.zillizcloud.com" # Zilliz Cloud
# token = "..."Use mfs config show to inspect effective config and mfs config set <key> <value> to edit it from the CLI.
Converted PDF/DOCX Markdown is cached under ~/.mfs/converted/. The cache is
bounded by [cache].max_size_mb and evicts the least recently used converted
files when it grows past the configured cap.
Milvus backends
| Backend | URI | Notes |
|---|---|---|
| Milvus Lite | ~/.mfs/milvus.db |
Default. Zero config. Single writer. |
| Self-hosted Milvus | http://localhost:19530 |
Concurrent writers, full BM25. |
| Zilliz Cloud | https://*.zillizcloud.com + token |
Managed. Full BM25. |
Embedding providers
| Provider | Example model | Dim |
|---|---|---|
| openai | text-embedding-3-small |
1536 |
| onnx | gpahal/bge-m3-onnx-int8 |
1024 |
gemini-embedding-001 |
768 | |
| voyage | voyage-3-lite |
512 |
| jina | jina-embeddings-v3 |
1024 |
| ollama | bge-m3, nomic-embed-text, … |
varies |
MFS reads .gitignore automatically and picks up a .mfsignore at the project
root (same syntax) — use either to exclude paths from indexing.
uv sync # install dev dependencies
uv run pytest tests/ -v # run the test suite
uv run ruff check src/ tests/ # lintThe codebase lives under src/mfs/:
cli.py— Click entry pointingest/— scanner, chunker (incl. tree-sitter AST), PDF/DOCX converter, workerembedder/— embedding providers (OpenAI, ONNX, Gemini, Voyage, Jina, Ollama, …)llm/— LLM / VLM providers for opt-in enrichmentsearch/— search, grep, summary, density presetsoutput/— display, pipe handshake, JSON schemastore.py— Milvus collection wrapper
- Rewrite low-level file processing modules in Rust for faster scanning, parsing, and format handling.
- Support more file formats and more multimodal content types.
- Run broader evaluations across more agent workflows, corpora, and real-world scenarios.
MFS is shaped by several related projects and communities:
- VKFS, built by a partner team, explores a similar Unix-like interface for agent access to vector-backed knowledge.
- claude-context and memsearch, earlier Zilliz projects, provided practical lessons from community feedback on code search, memory search, synchronization, and agent-facing architecture.
Apache License 2.0. See LICENSE.


