Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
9b06573
Add file modification time metadata to vector storage
EmiM Sep 2, 2025
6243bc6
Add metadata retrieval method to VectorStorageService
EmiM Sep 2, 2025
c8f36ef
Add initial project embedding method with mtime-based change detection
EmiM Sep 2, 2025
e01e3ed
Integrate initial project processing into project scan workflow
EmiM Sep 2, 2025
6283284
Use config ignore patterns and follow DRY principle in initial proces…
EmiM Sep 2, 2025
f6da3c6
Add deleted file detection and cleanup in initial processing
EmiM Sep 2, 2025
cbe25d3
Refactor ignore path logic into centralized utility function
EmiM Sep 2, 2025
16a3dd9
Add missing utils file
EmiM Sep 4, 2025
2cc7be8
Replace hardcoded dimensions with dynamic detection in VectorStorageS…
EmiM Sep 4, 2025
d36bf7f
Use instance embedding dimension in _ensure_namespace_exists
EmiM Sep 4, 2025
0bfa335
Use _ensure_namespace_exists in get_file_metadata for consistent name…
EmiM Sep 4, 2025
7627f77
Remove create_namespace method and fix _ensure_namespace_exists to re…
EmiM Sep 4, 2025
d084c76
Fix PosixPath JSON serialization error and store file_mtime as string
EmiM Sep 4, 2025
20d5642
Add comprehensive unit tests for VectorStorageService.get_file_metada…
EmiM Sep 5, 2025
4d8b0ee
Add get_embedding_dimensions method to VectorConfig
EmiM Sep 5, 2025
d84e026
Replace hardcoded dimensions in test_embedding_service.py
EmiM Sep 5, 2025
f7a99e9
Replace hardcoded dimensions in test_vector_storage_service.py
EmiM Sep 5, 2025
af05aa8
Remove redundant embedding_model='voyage-code-2' from test VectorConf…
EmiM Sep 5, 2025
1d208a2
Use shared constants in VoyageClient instead of local duplicates
EmiM Sep 5, 2025
6e30f89
Refactor file gathering logic and unify VectorDaemon test fixtures
EmiM Sep 8, 2025
4994f99
Add parallel file processing for initial embedding with semaphore-bas…
EmiM Sep 8, 2025
6625eb0
Add find_similar_code tool for vector-based code similarity search
EmiM Sep 8, 2025
ee16c43
Fix similarity_threshold and max_results usage with simplified Row ha…
EmiM Sep 8, 2025
f1302e9
Update return format with file_name, start_line, end_line at top leve…
EmiM Sep 8, 2025
a45d573
Add find_similar_code tool to MCP server and update documentation
EmiM Sep 9, 2025
a900dd9
Fix find_similar_code parameter names and add comprehensive tests wit…
EmiM Sep 9, 2025
a2ce444
Add comprehensive unit tests for VectorModeToolsService.find_similar_…
EmiM Sep 9, 2025
7ce30f8
Fix Row distance access: dist -> for TurboPuffer compatibility
EmiM Sep 9, 2025
cb008be
Improve query filters and enable batch metadata retrieval
EmiM Sep 10, 2025
628272c
Add batch embedding method to VoyageClient
EmiM Sep 11, 2025
3477aad
Add batch embedding method to EmbeddingService
EmiM Sep 11, 2025
b8dca15
Add batch upsert method to TurbopufferClient
EmiM Sep 11, 2025
112bd7e
Add batch storage method to VectorStorageService with efficient batch…
EmiM Sep 11, 2025
80277eb
Modify VectorDaemon to use true batch processing flow
EmiM Sep 11, 2025
ad1c914
Implement concurrent batch processing with semaphore control, add Voy…
EmiM Sep 11, 2025
c1a10f6
Add IndexMeta CRUD operations to DatabaseManager with comprehensive u…
EmiM Sep 12, 2025
226234f
Add IndexMeta-based initial vector embedding control
EmiM Sep 12, 2025
324957c
Fix Voyage API token limit exceeded error with automatic token-based …
EmiM Sep 12, 2025
ea6c458
Implement file-level token-aware batching with dual limit checks
EmiM Sep 12, 2025
bcf905d
Remove all _write_debug_log occurrences and fix related tests
EmiM Sep 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ mcp-code-indexer --vector --http --port 8080

Vector Mode adds powerful new MCP tools:
- `vector_search` - Semantic code search across projects
- `find_similar_code` - Find code similar to a given snippet or file section
- `similarity_search` - Find similar code patterns
- `dependency_search` - Discover code relationships
- `vector_status` - Monitor indexing progress
Expand Down Expand Up @@ -268,7 +269,7 @@ mypy src/

## 🛠️ MCP Tools Available

The server provides **11 powerful MCP tools** for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.
The server provides **13 powerful MCP tools** for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.

### 🎯 Essential Tools (Start Here)
| Tool | Purpose | When to Use |
Expand All @@ -291,6 +292,7 @@ The server provides **11 powerful MCP tools** for intelligent codebase managemen
| **`get_word_frequency`** | Technical vocabulary analysis | Domain understanding |
| **`update_codebase_overview`** | Create project documentation | Architecture documentation |
| **`search_codebase_overview`** | Search in project overviews | Finding specific topics |
| **`find_similar_code`** | Find code similar to snippet/section | Code pattern discovery (Vector Mode) |

### 🏥 System Health
| Tool | Purpose | For |
Expand All @@ -299,7 +301,7 @@ The server provides **11 powerful MCP tools** for intelligent codebase managemen

💡 **Pro Tip**: Always start with `check_codebase_size` to get personalized recommendations for navigating your specific codebase.

**📖 Complete API Documentation**: [View all 11 tools with examples →](docs/api-reference.md)
**📖 Complete API Documentation**: [View all 13 tools with examples →](docs/api-reference.md)

## 🔗 Git Hook Integration

Expand Down Expand Up @@ -363,7 +365,7 @@ Comprehensive documentation organized by user journey and expertise level.
| Guide | Purpose | Time Investment |
|-------|---------|-----------------|
| **[Quick Start](#-quick-start)** | Install and run your first server | 2 minutes |
| **[API Reference](docs/api-reference.md)** | Master all 12 MCP tools | 15 minutes |
| **[API Reference](docs/api-reference.md)** | Master all 13 MCP tools | 15 minutes |
| **[HTTP API Reference](docs/http-api.md)** | REST API for web applications | 10 minutes |
| **[Q&A Interface](docs/qa-interface.md)** | AI-powered codebase analysis | 8 minutes |
| **[Git Hook Setup](docs/git-hook-setup.md)** | Automate your workflow | 5 minutes |
Expand All @@ -387,7 +389,7 @@ Comprehensive documentation organized by user journey and expertise level.
### 📋 Quick References
- **[Examples & Integrations](examples/)** - Ready-to-use configurations
- **[Troubleshooting](#🚨-troubleshooting)** - Common issues & solutions
- **[API Tools Summary](#🛠️-mcp-tools-available)** - All 11 tools at a glance
- **[API Tools Summary](#🛠️-mcp-tools-available)** - All 13 tools at a glance

**📚 Reading Paths:**
- **New to MCP Code Indexer?** Quick Start → API Reference → HTTP API → Q&A Interface
Expand Down
159 changes: 157 additions & 2 deletions docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
**Last Updated:** 2025-01-15
**Verified Against:** src/mcp_code_indexer/server/mcp_server.py
**Test Sources:** tests/integration/test_mcp_tools.py, tests/unit/test_query_preprocessor.py
**Implementation:** All 12 tools verified against actual server code
**Implementation:** All 13 tools verified against actual server code
---

Complete reference for all 12 MCP tools provided by the Code Indexer server. Whether you're building AI agents or integrating MCP tools directly, this guide shows you exactly how to use each tool effectively.
Complete reference for all 13 MCP tools provided by the Code Indexer server. Whether you're building AI agents or integrating MCP tools directly, this guide shows you exactly how to use each tool effectively.

**🎯 New to MCP Code Indexer?** Start with the [Quick Start Guide](../README.md#-quick-start) to set up your server first.

Expand All @@ -27,6 +27,7 @@ Complete reference for all 12 MCP tools provided by the Code Indexer server. Whe
| [`search_codebase_overview`](#search_codebase_overview) | Search overviews | `projectName`, `folderPath`, `searchWord` |
| [`check_database_health`](#check_database_health) | System monitoring | None |
| [`enabled_vector_mode`](#enabled_vector_mode) | Configure vector search | `projectName`, `folderPath`, `enabled` |
| [`find_similar_code`](#find_similar_code) | Find similar code patterns | `projectName`, `folderPath`, code/file input |

⭐ **Start here** for new projects
📖 **[See Examples →](../examples/)**
Expand All @@ -52,6 +53,7 @@ Complete reference for all 12 MCP tools provided by the Code Indexer server. Whe
- [check_database_health](#check_database_health)
- [Configuration Management](#configuration-management)
- [enabled_vector_mode](#enabled_vector_mode)
- [find_similar_code](#find_similar_code)
- [Common Parameters](#common-parameters)
- [Error Handling](#error-handling)

Expand Down Expand Up @@ -941,6 +943,159 @@ try {
}
```

---

### find_similar_code

Find code similar to a given code snippet or file section using vector-based semantic search. This tool uses AI embeddings to understand code context and meaning, providing more intelligent similarity detection than text-based matching.

**⚠️ Vector Mode Required**: This tool only works when vector mode is enabled for the project.

#### Parameters

```typescript
interface FindSimilarCodeParams {
projectName: string; // The name of the project
folderPath: string; // Absolute path to the project folder on disk

// Input source (mutually exclusive)
code_snippet?: string; // Direct code snippet to search for similarities
file_path?: string; // Path to file containing code to analyze
line_start?: number; // Starting line number for file section (1-indexed)
line_end?: number; // Ending line number for file section (1-indexed)

// Search configuration (optional)
similarity_threshold?: number; // Minimum similarity score (0.0-1.0)
max_results?: number; // Maximum number of results to return
}
```

#### Response

```typescript
interface FindSimilarCodeResponse {
results: Array<{
file_path: string; // Path to file containing similar code
code_section: string; // The similar code section found
similarity_score: number; // Similarity score (0.0-1.0, higher is more similar)
start_line: number; // Starting line number of similar section
end_line: number; // Ending line number of similar section
context: string; // Additional context around the match
}>;
search_input: {
type: "snippet" | "file_section"; // Type of input used
content: string; // The code that was searched for
source?: string; // Source file path (if using file_path input)
};
total_results: number; // Total number of similar code sections found
similarity_threshold: number; // Similarity threshold used
}
```

#### Example Usage

##### Search by Code Snippet

```javascript
const result = await mcp.callTool("find_similar_code", {
projectName: "my-web-app",
folderPath: "/home/user/projects/my-web-app",
code_snippet: `
function validateEmail(email: string): boolean {
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return emailRegex.test(email);
}
`,
similarity_threshold: 0.7,
max_results: 5
});

// Response:
{
"results": [
{
"file_path": "src/utils/validators.ts",
"code_section": "function isValidEmail(emailAddress: string): boolean {\n const pattern = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/;\n return pattern.test(emailAddress);\n}",
"similarity_score": 0.92,
"start_line": 15,
"end_line": 18,
"context": "// Email validation utilities"
},
{
"file_path": "src/auth/validation.ts",
"code_section": "const validateUserEmail = (email: string) => {\n return /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email);\n};",
"similarity_score": 0.85,
"start_line": 42,
"end_line": 44,
"context": "User input validation functions"
}
],
"search_input": {
"type": "snippet",
"content": "function validateEmail(email: string): boolean {\n const emailRegex = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/;\n return emailRegex.test(email);\n}"
},
"total_results": 2,
"similarity_threshold": 0.7
}
```

##### Search by File Section

```javascript
const result = await mcp.callTool("find_similar_code", {
projectName: "large-api",
folderPath: "/home/user/projects/large-api",
file_path: "src/controllers/userController.ts",
line_start: 25,
line_end: 35,
similarity_threshold: 0.6,
max_results: 10
});

// Response:
{
"results": [
{
"file_path": "src/controllers/productController.ts",
"code_section": "async function createProduct(req: Request, res: Response) {\n try {\n const product = await productService.create(req.body);\n res.status(201).json(product);\n } catch (error) {\n res.status(400).json({ error: error.message });\n }\n}",
"similarity_score": 0.78,
"start_line": 18,
"end_line": 26,
"context": "Product CRUD operations"
}
],
"search_input": {
"type": "file_section",
"content": "async function createUser(req: Request, res: Response) {\n try {\n const user = await userService.create(req.body);\n res.status(201).json(user);\n } catch (error) {\n res.status(400).json({ error: error.message });\n }\n}",
"source": "src/controllers/userController.ts"
},
"total_results": 1,
"similarity_threshold": 0.6
}
```

#### 🎯 Use Cases

- **Code Duplication Detection**: Find similar functions or code patterns that could be refactored
- **Code Reuse Discovery**: Locate existing implementations similar to what you're building
- **Pattern Analysis**: Understand common patterns and approaches across your codebase
- **Refactoring Opportunities**: Identify code sections that follow similar patterns
- **Code Review**: Find similar implementations to ensure consistency
- **Learning**: Discover how similar problems were solved elsewhere in the codebase

#### ⚠️ Prerequisites

- **Vector Mode Enabled**: Project must have vector mode activated
- **API Keys Required**: VOYAGE_API_KEY and TURBOPUFFER_API_KEY environment variables
- **Project Indexed**: The project must be indexed with vector embeddings

#### 💡 Tips for Best Results

- **Meaningful Code Sections**: Use code sections with clear functionality (10-50 lines work well)
- **Adjust Similarity Threshold**: Start with 0.7, lower to 0.5 for broader matches
- **Use Representative Code**: Choose code that represents the pattern you're looking for
- **Consider Context**: Similar functionality may be implemented differently but serve the same purpose

## Common Parameters

All tools require these standard parameters for project identification:
Expand Down
50 changes: 49 additions & 1 deletion docs/http-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ interface MCPResponse {

### Available Tools

All 11 MCP tools are available via HTTP. See the [API Reference](api-reference.md) for complete tool documentation.
All 13 MCP tools are available via HTTP. See the [API Reference](api-reference.md) for complete tool documentation.

| Tool Name | Purpose |
|-----------|---------|
Expand All @@ -283,6 +283,8 @@ All 11 MCP tools are available via HTTP. See the [API Reference](api-reference.m
| `update_codebase_overview` | Create project docs |
| `search_codebase_overview` | Search overviews |
| `check_database_health` | System monitoring |
| `enabled_vector_mode` | Configure vector search |
| `find_similar_code` | Find similar code patterns |

### Example Tool Calls

Expand Down Expand Up @@ -401,6 +403,52 @@ curl -X POST -H "Content-Type: application/json" \
}
```

#### Find Similar Code

```bash
curl -X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "find_similar_code",
"arguments": {
"projectName": "my-app",
"folderPath": "/home/user/my-app",
"code_snippet": "function calculateTotal(items) {\n return items.reduce((sum, item) => sum + item.price, 0);\n}",
"similarity_threshold": 0.7,
"max_results": 5
}
}
}' \
http://localhost:7557/mcp
```

```json
{
"jsonrpc": "2.0",
"result": {
"results": [
{
"file_path": "src/utils/calculations.ts",
"code_section": "const sumPrices = (products) => {\n return products.reduce((total, product) => total + product.price, 0);\n};",
"similarity_score": 0.92,
"start_line": 15,
"end_line": 17,
"context": "Price calculation utilities"
}
],
"search_input": {
"type": "snippet",
"content": "function calculateTotal(items) {\n return items.reduce((sum, item) => sum + item.price, 0);\n}"
},
"total_results": 1,
"similarity_threshold": 0.7
}
}
```

## Server-Sent Events

The HTTP API supports Server-Sent Events (SSE) for real-time streaming of tool responses and system events.
Expand Down
Loading