Skip to content

Latest commit

 

History

History
652 lines (502 loc) · 23.7 KB

File metadata and controls

652 lines (502 loc) · 23.7 KB

CodexA — Developer Intelligence Engine
Semantic code search · AI-assisted understanding · Agent tooling protocol

CI PyPI Downloads Python 3.11+ Version Tests Coverage mypy strict License


CodexA is a lightweight developer intelligence engine designed to cooperate with AI coding assistants (GitHub Copilot, Cursor, Cline, etc.) and developer tooling. It indexes codebases locally, performs semantic search, and exposes a structured tool protocol that any AI agent can call over HTTP or CLI.

Features

Area What you get
Code Indexing Scan repos, extract functions/classes, generate vector embeddings (sentence-transformers + FAISS), ONNX runtime option, parallel indexing, --watch live re-indexing, .codexaignore support, --add/--inspect per-file control, model-consistency guard, Ctrl+C partial-save
Rust Search Engine Native codexa-core Rust crate via PyO3 — HNSW approximate nearest-neighbour search, BM25 keyword index, tree-sitter AST chunker (10 languages), memory-mapped vector persistence, parallel file scanner, optional ONNX embedding inference, optional Tantivy full-text search
Multi-Mode Search Semantic, keyword (BM25), regex, hybrid (RRF), and raw filesystem grep (ripgrep backend) with full -A/-B/-C/-w/-v/-c/-l/-L/--exclude/--no-ignore flags, --hybrid/--sem shorthands, --scores, --snippet-length, --no-snippet, JSONL streaming
RAG Pipeline 4-stage Retrieval-Augmented Generation — Retrieve → Deduplicate → Re-rank → Assemble with token budget, cross-encoder re-ranking, source citations
Code Context Rich context windows — imports, dependencies, AST-based call graphs, surrounding code
Repository Analysis Language breakdown (codexa languages), module summaries, component detection
AI Agent Protocol 13 built-in tools exposed via HTTP bridge, MCP server (13 tools with pagination/cursors), MCP-over-SSE (--mcp), codexa --serve shorthand, Claude Desktop auto-config (--claude-config), or CLI for any AI agent to invoke
Quality & Metrics Complexity analysis, maintainability scoring, quality gates for CI
Multi-Repo Workspaces Link multiple repos under one workspace for cross-repo search & refactoring
Interactive TUI Terminal REPL with mode switching for interactive exploration
Streaming Responses Token-by-token streaming for chat and investigation commands
Plugin System 22 hooks for extending every layer — from indexing to tool invocation
VS Code Extension 4-panel sidebar (Search, Symbols, Quality, Tools), 8 commands, CodeLens, context menus, status bar
Editor Plugins Zed, JetBrains (IntelliJ/PyCharm), Neovim (telescope.nvim), Vim, Sublime Text, Emacs, Helix, Eclipse -- all sharing the same MCP/bridge protocol
Cross-Language Intelligence FFI pattern detection, polyglot dependency graphs, language-aware search boosting, universal multi-language call graph
Multi-Agent Sessions Concurrent AI agent sessions with shared discovery, semantic diff (rename/move/signature/body detection), RAG code generation

Quick Start

1. Install

pip install codexa

For semantic indexing and vector search, install the ML extras:

pip install "codexa[ml]"

Or install from source:

git clone https://github.com/M9nx/CodexA.git
cd CodexA
pip install -e ".[dev]"

Alternative installation methods:

# Docker
docker build -t codexa .
docker run --rm -v /path/to/project:/workspace codexa search "auth"

# Homebrew (macOS)
brew install --formula Formula/codexa.rb

2. Initialize a Project

Navigate to any project you want to analyze and run:

cd /path/to/your-project
codexa init

CodexA auto-detects your available RAM and picks the best embedding model. Or choose a model profile explicitly:

codexa init --profile fast       # mxbai-embed-xsmall — low RAM (<1 GB)
codexa init --profile balanced   # MiniLM — good balance (~2 GB)
codexa init --profile precise    # jina-code — best quality (~4 GB)

This creates a .codexa/ directory with configuration, index storage, and session data.

3. Index the Codebase

codexa index .

This parses all source files (Python, JS/TS, Java, Go, Rust, C#, Ruby, C++), extracts symbols, generates embeddings, and stores them in a local FAISS index. Semantic indexing requires codexa[ml].

If you need to keep secrets, generated files, or local config files out of the index, add patterns to .codexaignore at the project root or configure index.exclude_files in .codexa/config.json.

Typical .codexaignore example:

.env*
secrets/*.json
config/local-*.yml
vendor/*

The default embedding model is small, but the PyTorch backend still needs about 2 GB of available RAM. On lower-memory machines, prefer the ONNX backend.

4. Semantic Search

codexa search "jwt authentication"
codexa search "database connection pool" --json
codexa search "error handling" -k 5

5. Explore More

codexa explain MyClass              # Structural explanation of a symbol
codexa context parse_config         # Rich AI context window
codexa deps src/auth.py             # Import / dependency map
codexa summary                      # Full repo summary
codexa quality src/                 # Code quality analysis
codexa hotspots                     # High-risk code hotspots
codexa trace handle_request         # Execution trace of a symbol
codexa evolve                       # Self-improving development loop
codexa grep "TODO|FIXME"            # Raw filesystem grep (ripgrep or Python)
codexa benchmark                    # Performance benchmarking

Using CodexA with AI Agents (GitHub Copilot, etc.)

CodexA is designed to be called by AI coding assistants as an external tool. There are three integration modes: CLI tool mode, HTTP bridge server, and in-process Python API.

Option A — CLI Tool Mode (Recommended for Copilot Chat)

Any AI agent that can run shell commands can use CodexA directly:

# List available tools
codexa tool list --json

# Run a tool with arguments
codexa tool run semantic_search --arg query="authentication middleware" --json
codexa tool run explain_symbol --arg symbol_name="UserService" --json
codexa tool run get_call_graph --arg symbol_name="process_payment" --json
codexa tool run get_dependencies --arg file_path="src/auth.py" --json

# Get tool schema (so the agent knows what arguments to pass)
codexa tool schema semantic_search --json

The --json flag ensures machine-readable output. The --pipe flag suppresses colors and spinners for clean piping.

Option B — HTTP Bridge Server (For MCP / Long-Running Agents)

Start the bridge server to expose all tools over HTTP:

codexa serve --port 24842

The server runs on http://127.0.0.1:24842 and exposes:

Method Endpoint Description
GET /capabilities Full capability manifest — version, tools, supported requests
GET /health Health check → {"status": "ok"}
GET /tools/list List all available tools with schemas
POST /tools/invoke Execute a tool by name with arguments
GET /tools/stream SSE stream — tool discovery + heartbeat
POST /request Dispatch any AgentRequest (12 request kinds)

Example — invoke a tool via HTTP:

curl -X POST http://127.0.0.1:24842/tools/invoke \
  -H "Content-Type: application/json" \
  -d '{"tool_name": "semantic_search", "arguments": {"query": "error handling"}}'

Example — list capabilities:

curl http://127.0.0.1:24842/capabilities

Option C — Python API (In-Process)

from pathlib import Path
from semantic_code_intelligence.tools.executor import ToolExecutor
from semantic_code_intelligence.tools.protocol import ToolInvocation

executor = ToolExecutor(Path("/path/to/project"))
invocation = ToolInvocation(tool_name="semantic_search", arguments={"query": "auth"})
result = executor.execute(invocation)

print(result.success)           # True
print(result.result_payload)    # dict with search results
print(result.execution_time_ms) # timing in milliseconds

Setting Up with VS Code + GitHub Copilot

Step 1 — Install CodexA globally

# Clone the repo
git clone https://github.com/M9nx/CodexA.git

# Install it (makes `codexa` available system-wide in your venv)
cd CodexA
pip install -e ".[dev]"

# Verify
codexa --version    # → codexa, version 0.5.0

Step 2 — Initialize your target project

cd /path/to/your-project
codexa init --index  # Creates .codexa/ and indexes immediately
# Or separately:
codexa init          # Creates .codexa/ directory
codexa index .       # Index the entire codebase
codexa doctor        # Verify everything is healthy
codexa search "main" # Quick sanity check

Step 3 — Add Copilot Custom Instructions (System Prompt)

Create the file .github/copilot-instructions.md in your project root. This file acts as a system prompt — GitHub Copilot reads it automatically and follows the instructions in every chat and code generation session.

mkdir -p .github

Then create .github/copilot-instructions.md with this content:

# Copilot Custom Instructions

## CodexA Integration

This project uses **CodexA** — a local developer intelligence engine.
You have access to the `codexa` CLI for semantic code search, symbol
explanation, dependency analysis, and more.

### Available Commands

Before answering questions about this codebase, use CodexA to gather context:

- **Search the codebase:**
  ```bash
  codexa search "<natural language query>" --json
  ```

- **Explain a symbol (function/class/method):**
  ```bash
  codexa tool run explain_symbol --arg symbol_name="<name>" --json
  ```

- **Get the call graph of a function:**
  ```bash
  codexa tool run get_call_graph --arg symbol_name="<name>" --json
  ```

- **Get file dependencies/imports:**
  ```bash
  codexa tool run get_dependencies --arg file_path="<path>" --json
  ```

- **Find all references to a symbol:**
  ```bash
  codexa tool run find_references --arg symbol_name="<name>" --json
  ```

- **Get rich context for a symbol:**
  ```bash
  codexa tool run get_context --arg symbol_name="<name>" --json
  ```

- **Summarize the entire repo:**
  ```bash
  codexa tool run summarize_repo --json
  ```

- **Explain all symbols in a file:**
  ```bash
  codexa tool run explain_file --arg file_path="<path>" --json
  ```

### Rules

1. Always use `--json` flag for machine-readable output.
2. When asked about code structure, search with `codexa search` first.
3. When explaining a function or class, use `codexa tool run explain_symbol`.
4. When analyzing impact of changes, use `codexa impact`.
5. When reviewing code, run `codexa quality <path>` first.
6. Prefer CodexA tools over reading large files manually — they provide
   structured, indexed results.

Step 4 — Configure Copilot Chat to use CodexA

In VS Code, open Settings (Ctrl+,) and search for:

Setting Value Purpose
github.copilot.chat.codeGeneration.instructions Add .github/copilot-instructions.md Auto-loads custom instructions
chat.agent.enabled true Enables agent mode in Copilot Chat

Or add this to your .vscode/settings.json:

{
  "github.copilot.chat.codeGeneration.instructions": [
    { "file": ".github/copilot-instructions.md" }
  ]
}

Step 5 — Use Copilot Chat with CodexA

Open Copilot Chat in VS Code (Ctrl+Shift+I or the chat panel) and switch to Agent mode (the dropdown at the top). Now Copilot can run terminal commands and will automatically use CodexA per your instructions.

Example conversations:

You: What does the process_payment function do and what calls it?

Copilot runs:

codexa tool run explain_symbol --arg symbol_name="process_payment" --json
codexa tool run get_call_graph --arg symbol_name="process_payment" --json

Then gives you a structured answer with callers, callees, and explanation.

You: Find all code related to authentication

Copilot runs: codexa search "authentication" --json Returns ranked semantic search results across your entire codebase.

You: What would break if I change UserService?

Copilot runs:

codexa tool run find_references --arg symbol_name="UserService" --json
codexa impact

Shows blast radius and all dependents.

You: Review the code quality of src/api/

Copilot runs: codexa quality src/api/ --json Returns complexity scores, dead code, duplicates, and security issues.

Step 6 — Start the Bridge Server (optional, for MCP)

For persistent connections (MCP servers, custom agent frameworks):

codexa serve --port 24842

The agent can then call http://127.0.0.1:24842/tools/invoke directly.

Step 7 — Configure LLM provider (optional)

For AI-powered commands (codexa ask, codexa review, codexa chat, etc.), edit .codexa/config.json:

{
  "llm": {
    "provider": "openai",
    "model": "gpt-4",
    "api_key": "sk-...",
    "temperature": 0.2,
    "max_tokens": 2048
  }
}

Supported providers: openai, ollama (local), mock (testing).


All CLI Commands

CodexA provides 39 commands (plus subcommands) organized by capability:

Core

Command Description
codexa init [path] Initialize project — creates .codexa/ directory (supports --index and --vscode)
codexa index [path] Index codebase for semantic search
codexa search "<query>" Natural-language semantic search
codexa explain <symbol> Structural explanation of a symbol or file
codexa context <symbol> Rich context window for AI consumption
codexa summary Structured repository summary
codexa deps <file> File/project dependency map
codexa watch Background indexing daemon (Rust-backed native file watcher)
codexa grep "<pattern>" Raw filesystem grep — no index required (ripgrep backend)
codexa benchmark Performance benchmarking (indexing, search, memory)
codexa languages List supported tree-sitter languages with grammar status

AI-Powered

Command Description
codexa ask "<question>" Ask a question about the codebase (LLM)
codexa review <file> AI-powered code review
codexa refactor <file> AI-powered refactoring suggestions
codexa suggest <symbol> Intelligent improvement suggestions
codexa chat Multi-turn conversation with session persistence
codexa investigate <goal> Autonomous multi-step code investigation

Quality & Metrics

Command Description
codexa quality [path] Code quality analysis
codexa metrics Code metrics, snapshots, and trends
codexa hotspots Identify high-risk code hotspots
codexa gate Enforce quality gates for CI pipelines
codexa impact Blast radius analysis of code changes

DevOps & Integration

Command Description
codexa serve Start HTTP bridge server for AI agents
codexa tool list|run|schema AI Agent Tooling Protocol commands
codexa pr-summary Generate PR intelligence report
codexa ci-gen Generate CI workflow templates
codexa web Start web interface and REST API
codexa viz Generate Mermaid visualizations
codexa evolve Self-improving development loop

Workspace & Utilities

Command Description
codexa workspace Multi-repo workspace management
codexa cross-refactor Cross-repository refactoring
codexa trace <symbol> Trace execution relationships
codexa docs Generate project documentation
codexa doctor Environment health check
codexa plugin list|scaffold|discover Plugin management
codexa tui Interactive terminal REPL
codexa mcp Start MCP (Model Context Protocol) server
codexa models list|info|download|switch|profiles|benchmark Manage and benchmark embedding models

VS Code Extension

Feature Command / Keybinding
Multi-mode search panel (semantic/keyword/hybrid/regex) Sidebar → Search
Symbol explorer (explain, call graph, deps) Sidebar → Symbols & Graphs
Code quality dashboard (quality, metrics, hotspots) Sidebar → Quality
Agent tool runner (doctor, index, models, 13 tools) Sidebar → Tools
Search codebase Ctrl+Shift+F5
Explain symbol at cursor Ctrl+Shift+E
Code quality analysis Ctrl+Shift+Q
Right-click → Explain / Call Graph Editor context menu

Built-in Tools (AI Agent Protocol)

These tools can be invoked via CLI (codexa tool run), HTTP (POST /tools/invoke), or Python API (ToolExecutor.execute()):

Tool Arguments Description
semantic_search query (string) Search codebase by natural language
explain_symbol symbol_name (string) Structural explanation of a symbol
explain_file file_path (string) Explain all symbols in a file
summarize_repo (none) Full repository summary
find_references symbol_name (string) Find all references to a symbol
get_dependencies file_path (string) Import / dependency map for a file
get_call_graph symbol_name (string) Call graph — callers and callees
get_context symbol_name (string) Rich context window for AI tasks
get_file_context file_path, line or symbol_name Full-section surrounding code retrieval
get_quality_score file_path (string, optional) Code quality analysis — complexity, dead code, duplicates
find_duplicates threshold (float, optional) Detect near-duplicate code blocks
grep_files pattern (string) Raw filesystem regex search (ripgrep/Python)
list_languages (none) List supported tree-sitter languages and grammar status

Additional tools can be registered via the plugin system using the REGISTER_TOOL hook.


Architecture

┌─────────────────────────────────────────────────────┐
│                    CLI Layer (click)                 │
│  39 commands · --json · --pipe · --verbose           │
├─────────────────────────────────────────────────────┤
│               AI Agent Tooling Protocol              │
│  ToolExecutor · ToolInvocation · ToolExecutionResult │
├─────────────────────────────────────────────────────┤
│                  Bridge Server (HTTP)                │
│  /tools/invoke · /tools/list · /request · SSE stream │
├──────────────┬──────────────┬───────────────────────┤
│ Parsing      │ Embedding    │ Search                │
│ tree-sitter  │ sent-trans   │ FAISS / Rust HNSW     │
├──────────────┼──────────────┴───────────────────────┤
│ Rust Engine  │  codexa-core (PyO3)                   │
│ (optional)   │  HNSW · BM25 · AST chunk · mmap · RRF│
├──────────────┼──────────────────────────────────────┤
│ RAG Pipeline │  Retrieve → Dedup → Re-rank → Assemble│
├──────────────┼──────────────────────────────────────┤
│ Evolution    │  Self-improving dev loop              │
│ engine       │  budget · task · patch · test · commit│
├──────────────┴──────────────────────────────────────┤
│              Plugin System (22 hooks)                │
├─────────────────────────────────────────────────────┤
│         Storage (.codexa/ — config, index, cache)     │
└─────────────────────────────────────────────────────┘

Configuration

After codexa init, your project has .codexa/config.json:

{
  "embedding": {
    "model_name": "all-MiniLM-L6-v2",
    "chunk_size": 512,
    "chunk_overlap": 64
  },
  "search": {
    "top_k": 10,
    "similarity_threshold": 0.3
  },
  "index": {
    "use_incremental": true,
    "extensions": [".py", ".js", ".ts", ".java", ".go", ".rs", ".rb", ".cpp", ".cs"]
  },
  "llm": {
    "provider": "mock",
    "model": "",
    "api_key": "",
    "temperature": 0.2,
    "max_tokens": 2048
  }
}

Tip: Instead of editing model_name manually, use codexa init --profile fast|balanced|precise or run codexa models profiles to see recommended models for your hardware.


Documentation

CodexA ships with a full VitePress documentation site.

# Install docs dependencies
npm install

# Serve locally (live-reload)
npm run docs:dev

# Build static site
npm run docs:build

# Preview the build
npm run docs:preview

Browse the docs at http://localhost:5173 after running npm run docs:dev.


Development

# Install dev dependencies
pip install -e ".[dev]"

# Run all 2657 tests
pytest

# Run with coverage (gate: 70% minimum)
pytest --cov=semantic_code_intelligence

# Run mypy strict type checking
mypy semantic_code_intelligence --exclude "tests/"

# Run specific phase tests
pytest semantic_code_intelligence/tests/test_phase23.py -v

# Run with verbose output
codexa --verbose search "query"

Tech Stack

  • Python 3.11+ — No heavy frameworks, stdlib-first design
  • Rust (codexa-core) — Native search engine via PyO3 — HNSW (instant-distance), BM25, tree-sitter AST chunking, mmap persistence, parallel scanning (rayon)
  • click — CLI framework
  • sentence-transformers — Embedding generation (all-MiniLM-L6-v2)
  • faiss-cpu — Vector similarity search (with Rust HNSW acceleration)
  • tree-sitter — Multi-language code parsing (Python + Rust)
  • watchfiles — Rust-backed native file watching (inotify/FSEvents/ReadDirectoryChanges)
  • pydantic — Configuration & data models
  • rich — Terminal UI and formatting

License

MIT — see LICENSE for details.