vemora

Repository-local memory system for LLM-assisted development.

Builds a structured, versioned index of your codebase — code chunks, symbols, dependency graph, and LLM-generated summaries — and enables semantic or keyword search over it. The result is a RAG (Retrieval-Augmented Generation) layer that lets you give an LLM only the code it actually needs, instead of entire files.

Why

When working on a large codebase with Claude Code or similar LLM tools, you face two problems:

Context cost — dropping 50 files into the context wastes tokens on irrelevant code
Discovery — you don't always know which files are relevant to a given task

vemora solves both by pre-indexing the repo and making it queryable.

Architecture in three layers

.vemora/          ← versioned in git, shared across the team
  config.json
  metadata.json
  index/
    files.json       ← file hashes for incremental indexing
    chunks.json      ← code chunks (function/class/window slices)
    symbols.json     ← extracted symbol map
    deps.json        ← intra-project dependency graph
    callgraph.json   ← function-level call relationships
    todos.json       ← TODO/FIXME/HACK/XXX annotations extracted from source
  summaries/
    file-summaries.json   ← LLM-generated 2-3 line description per file
    project-summary.json  ← LLM-generated ~500 word project overview
  knowledge/
    entries.json     ← human/LLM-authored notes: decisions, gotchas, patterns

~/.vemora-cache/<projectId>/    ← local to each developer, NOT in git
  embeddings.json                  ← metadata (model, dimensions, chunk mapping)
  embeddings.bin                   ← binary buffer of vectors (Float32Array)
  embeddings.hnsw.json             ← serialized HNSW index for ultra-fast search

The index, summaries, and knowledge entries are committed to git so teammates share them. Embeddings are generated locally by each developer from the shared index.

Installation

# Inside the vemora/ directory
pnpm install
pnpm build

# Link globally (optional)
pnpm link

Or run directly with `node vemora/dist/cli.js` from the project root.

## Installing the alpha version from npm

To install the alpha version:

pnpm install vemora@alpha


Or globally:

pnpm install -g vemora@alpha


You can also use npm:

npm install vemora@alpha npm run build npm link


## The Core Workflow

### 1. Setup (first time only)

```bash
vemora init                  # create .vemora/ and config.json
vemora index --no-embed      # build index without embeddings (fast)
vemora index                 # or: build index + generate embeddings
vemora summarize             # optional: generate LLM descriptions per file
vemora init-agent            # generate instruction files for AI agents

2. Query during development

# Search for relevant code
vemora query "how does IMAP reconnect work?"

# Full context block ready to paste into any LLM
vemora context --query "email retry logic" > context.md

# One-shot answer from the configured LLM
vemora ask "why does the sync queue stall?"

# Save a finding for future sessions
vemora remember "EmailService.send queues if SMTP is offline — see OutboxRepository"

3. Keep the index fresh

vemora index --watch         # incremental re-index on file save
vemora index --no-embed      # after code changes, update structure only

Commands

`vemora init`

Creates the .vemora/ folder structure and adds .vemora-cache/ to .gitignore.

Options:
  --root <dir>   project root (default: cwd)

`vemora index`

Scans the repo, parses symbols, builds the dependency graph, extracts TODO/FIXME/HACK/XXX annotations, and generates embeddings. Incremental — only re-processes files whose SHA-256 hash has changed.

Options:
  --root <dir>   project root (default: cwd)
  --force        re-index all files, ignoring hashes
  --no-embed     skip embedding generation (index structure only)
  -w, --watch    watch for changes and re-index automatically

`vemora query "<question>"`

Searches the index using vector similarity (or keyword fallback). Results use a three-tier display that compresses output by relevance rank.

Options:
  --root <dir>        project root (default: cwd)
  -k, --top-k <n>     number of results (default: 10)
  -c, --show-code     show full code for all results (overrides tier system)
  --keyword           force keyword/BM25 search (no API call needed)
  --format <fmt>      output format: terminal (default) | json | markdown | terse
  --rerank            re-score results with a cross-encoder model
  --hybrid            use hybrid search (vector + BM25)
  --alpha <n>         hybrid weight for vector search (0-1, default 0.7)
  --budget <n>        max tokens to include across results
  --mmr               apply Maximal Marginal Relevance to diversify results
  --merge             merge adjacent chunks from the same file

Output formats

Format	Use case
`terminal`	Default coloured output for interactive use
`json`	Machine-readable — for piping to scripts
`markdown`	Paste-ready Markdown with code blocks
`terse`	One line per result — recommended for small/local models

Terse format example:

src/core/email/services/email.service.ts:45 | EmailService.send (method) | 0.912 | async send(email: Email): Promise<void>
src/infrastructure/protocols/smtp/smtp.service.ts:12 | SmtpService.connect (method) | 0.841 | async connect(config: SmtpConfig): Promise<void>

Output tiers (terminal/markdown)

Rank	Tier	Content shown
1–3	high	Full code block (capped at 30 lines)
4–7	med	Declaration signature only
8+	low	File path + symbol + score + AI summary

`vemora context`

Generates an optimized LLM context block combining project overview, a specific file, and relevant code chunks. Designed to be piped to a file or clipboard.

Options:
  --root <dir>          project root (default: cwd)
  -q, --query <text>    natural-language query to find relevant code
  -f, --file <path>     include a specific file in full with its dependency graph
  -k, --top-k <n>       number of search results to include (default: 5)
  --keyword             use keyword search instead of semantic search
  --show-code           show full code without line cap
  --format <fmt>        output format: markdown (default) | plain | terse
  --rerank              re-score results with a cross-encoder model
  --hybrid              use hybrid search (vector + BM25)
  --budget <n>          max tokens to include across retrieved chunks
  --structured          emit a structured block (Entry Point / Dependencies / Types / Patterns)

At least one of --query or --file is required.

When --file is used, the context block also includes:

Recent git commits that touched the file (last 5, via git log --follow)
TODO/FIXME/HACK/XXX annotations present in the file (from the index)
Test files linked to the file — convention-based (.test.ts, __tests__/) and import-based discovery
Symbol callers — for each symbol defined in the file, which other project symbols call it

`vemora ask "<question>"`

One-shot Q&A: retrieves relevant context and calls the configured LLM to answer directly. No interactive loop.

Options:
  --root <dir>        project root (default: cwd)
  -k, --top-k <n>     chunks to retrieve (default: 5)
  --keyword           use keyword search (no embeddings needed)
  --hybrid            use hybrid vector+BM25 search
  --budget <n>        max context tokens to send to LLM (default: 6000)
  --show-context      print the retrieved context before the answer

Requires summarization to be configured in config.json. Useful for local models (Ollama) where the agent does not need to orchestrate multiple commands.

vemora ask "how does the IMAP reconnect logic work?" --root .
vemora ask "what does EmailService.send do?" --root . --keyword

`vemora remember "<text>"`

Saves a persistent knowledge entry to .vemora/knowledge/entries.json. The entry is committed to git and included automatically in future context and ask results when relevant.

Options:
  --root <dir>            project root (default: cwd)
  --category <cat>        decision | pattern | gotcha | glossary (default: decision)
  --files <paths>         comma-separated related file paths
  --symbols <names>       comma-separated related symbol names
  --confidence <level>    high | medium | low (default: medium)

vemora remember "EmailService.send queues if SMTP offline — see OutboxRepository" \
  --category gotcha \
  --files src/core/email/services/email.service.ts \
  --symbols EmailService.send

`vemora knowledge`

Manages saved knowledge entries.

vemora knowledge list --root .          # list all entries grouped by category
vemora knowledge forget <id> --root .   # remove an entry by ID (prefix match)

`vemora init-agent`

Generates AI agent instruction files from the existing index. Supports Claude Code, GitHub Copilot, Cursor, and Windsurf.

Options:
  --root <dir>            project root (default: cwd)
  --agents <list>         comma-separated: claude,copilot,cursor,windsurf (default: all)
  --force                 overwrite existing files that have no vemora markers

Agent	Output file
`claude`	`CLAUDE.md`
`copilot`	`.github/copilot-instructions.md`
`cursor`	`.cursor/rules/vemora.mdc` (with `alwaysApply: true`)
`windsurf`	`.windsurfrules`

Each file includes a two-layer instruction set: abstract guidelines (for large cloud models) and an explicit quick-reference table (for small/local models).

Re-running init-agent only updates the auto-generated block between  markers. Custom content outside the markers is preserved.

`vemora init-claude`

Thin wrapper for init-agent --agents claude. Kept for backward compatibility.

`vemora summarize`

Generates LLM-powered summaries for every indexed file and a high-level project overview. Incremental — only re-generates summaries for files whose content has changed.

Options:
  --root <dir>       project root (default: cwd)
  --force            re-generate all summaries
  --model <name>     override LLM model (default: gpt-4o-mini)
  --files-only       only generate per-file summaries
  --project-only     (re)generate project overview from existing file summaries

`vemora status`

Prints index stats, embedding cache info, knowledge store summary (with staleness warnings), and a count of TODO/FIXME/HACK/XXX annotations by type.

`vemora deps <file>`

Shows the full dependency context for a file: what it imports, what imports it.

Options:
  --root <dir>      project root (default: cwd)
  -d, --depth <n>   transitive depth for outgoing imports (default: 1)

`vemora overview`

Prints the project overview to stdout.

vemora overview --root . > OVERVIEW.md

`vemora chat`

Interactive chat session with the codebase. Supports OpenAI, Anthropic, and Ollama.

vemora chat
vemora chat --provider anthropic --model claude-3-5-sonnet-20240620
vemora chat --provider ollama --model qwen2.5-coder:14b

`vemora report`

Shows a usage statistics report: commands breakdown, search method distribution, token savings from each optimization step (semantic dedup, session filter, budget cap), and most frequent query terms.

Options:
  --root <dir>   project root (default: cwd)
  --days <n>     limit report to events from the last N days
  -v, --verbose  show per-query breakdown (last 20 queries)
  --clear        clear all recorded usage data

Usage is tracked automatically on every query, context, and ask invocation. Data is stored locally at ~/.vemora-cache/<projectId>/usage.log.json (never committed to git).

vemora report --root .            # full report
vemora report --root . --days 7   # last week only
vemora report --root . --verbose  # + per-query log
vemora report --root . --clear    # reset usage history

Session flags (`--session`, `--fresh`)

Both query and context support session memory: chunks already seen in the current session are skipped to avoid re-sending redundant context to the LLM.

--session   skip chunks already seen in this session (auto-expires after 30 min idle)
--fresh     reset session memory before this query

vemora query "email retry logic" --root . --session
vemora context --root . --query "sync engine" --session --fresh

`vemora bench <query>`

Compares token consumption between minimal and full context modes.

Configuration

Edit .vemora/config.json after init:

{
  "projectId": "b88eb8199f78331e",
  "projectName": "my-app",
  "version": "1.0.0",
  "include": ["**/*.ts", "**/*.tsx"],
  "exclude": ["**/node_modules/**", "**/dist/**"],
  "maxChunkLines": 80,
  "maxChunkChars": 3000,
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "dimensions": 1536
  },
  "summarization": {
    "provider": "openai",
    "model": "gpt-4o-mini"
  },
  "display": {
    "format": "terse"
  }
}

`display.format`

Sets the default output format for query, context, and ask. Set to "terse" for small/local models with limited context windows. Can always be overridden per-command with --format markdown.

Embedding providers

Provider	Config	Notes
`openai`	`OPENAI_API_KEY` env or `apiKey` in config	Best quality. Requires `npm install openai`.
`ollama`	`baseUrl` (default: `http://localhost:11434`)	Local, no cost, no extra install.
`none`	—	Keyword search only, no embeddings.

LLM providers

Used by ask, chat, and summarize. The embedding provider and LLM provider are configured independently.

Provider	Config	Notes
`openai`	`OPENAI_API_KEY` env or `apiKey` in config	Requires `npm install openai`.
`anthropic`	`ANTHROPIC_API_KEY` env or `apiKey` in config	Requires `npm install @anthropic-ai/sdk`.
`ollama`	`baseUrl` (default: `http://localhost:11434`)	Local, no cost, no extra install.

Note: Anthropic does not offer an embedding API. If you use anthropic as your LLM provider, you still need to choose a separate embedding provider (openai or ollama).

Using local models (Ollama)

Fully offline workflow with no API keys required:

ollama pull nomic-embed-text      # 274 MB — embeddings
ollama pull qwen2.5-coder:14b     # ~9 GB — recommended for 16 GB RAM

{
  "embedding": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "baseUrl": "http://localhost:11434",
    "dimensions": 768
  },
  "summarization": {
    "provider": "ollama",
    "model": "qwen2.5-coder:14b",
    "baseUrl": "http://localhost:11434"
  },
  "display": { "format": "terse" }
}

The query and context commands do not call the LLM — they only use embeddings. The LLM is called only by ask, chat, and summarize.

What goes in git

✓ .vemora/config.json
✓ .vemora/metadata.json
✓ .vemora/index/files.json
✓ .vemora/index/chunks.json
✓ .vemora/index/symbols.json
✓ .vemora/index/deps.json
✓ .vemora/index/callgraph.json
✓ .vemora/summaries/file-summaries.json
✓ .vemora/summaries/project-summary.json
✓ .vemora/knowledge/entries.json    ← shared knowledge store

✗ .vemora-cache/                    ← local embedding vectors (gitignored)

Incremental indexing

Chunk IDs are derived from sha256(filePath + content). If a function's code doesn't change, its chunk ID is stable across branches — embeddings are reused without any API call.

Tech stack

TypeScript + Node.js (CommonJS, ES2022 target)
commander — CLI framework
fast-glob — repository scanning
tree-sitter (optional) — AST-based symbol extraction for TS/JS
openai SDK (optional) — embedding generation and OpenAI LLM provider; install with npm install openai
@anthropic-ai/sdk (optional) — Anthropic/Claude LLM provider; install with npm install @anthropic-ai/sdk
@xenova/transformers — local cross-encoder model for --rerank
hnsw — HNSW index for sub-millisecond vector search
chokidar — file watching for --watch mode
chalk + ora — terminal output

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.ai-memory		.ai-memory
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
biome.json		biome.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

vemora

Why

Architecture in three layers

Installation

2. Query during development

3. Keep the index fresh

Commands

vemora init

vemora index

vemora query "<question>"

Output formats

Output tiers (terminal/markdown)

vemora context

vemora ask "<question>"

vemora remember "<text>"

vemora knowledge

vemora init-agent

vemora init-claude

vemora summarize

vemora status

vemora deps <file>

vemora overview

vemora chat

vemora report

Session flags (--session, --fresh)

vemora bench <query>