Skip to content

Commit df9b650

Browse files
authored
Merge pull request #5 from EmiM/feat/initial-embedding
Initial embedding + find_similar_code tool
2 parents 2db3d05 + bcf905d commit df9b650

29 files changed

+5083
-832
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,7 @@ mcp-code-indexer --vector --http --port 8080
208208

209209
Vector Mode adds powerful new MCP tools:
210210
- `vector_search` - Semantic code search across projects
211+
- `find_similar_code` - Find code similar to a given snippet or file section
211212
- `similarity_search` - Find similar code patterns
212213
- `dependency_search` - Discover code relationships
213214
- `vector_status` - Monitor indexing progress
@@ -268,7 +269,7 @@ mypy src/
268269

269270
## 🛠️ MCP Tools Available
270271

271-
The server provides **11 powerful MCP tools** for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.
272+
The server provides **13 powerful MCP tools** for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.
272273

273274
### 🎯 Essential Tools (Start Here)
274275
| Tool | Purpose | When to Use |
@@ -291,6 +292,7 @@ The server provides **11 powerful MCP tools** for intelligent codebase managemen
291292
| **`get_word_frequency`** | Technical vocabulary analysis | Domain understanding |
292293
| **`update_codebase_overview`** | Create project documentation | Architecture documentation |
293294
| **`search_codebase_overview`** | Search in project overviews | Finding specific topics |
295+
| **`find_similar_code`** | Find code similar to snippet/section | Code pattern discovery (Vector Mode) |
294296

295297
### 🏥 System Health
296298
| Tool | Purpose | For |
@@ -299,7 +301,7 @@ The server provides **11 powerful MCP tools** for intelligent codebase managemen
299301

300302
💡 **Pro Tip**: Always start with `check_codebase_size` to get personalized recommendations for navigating your specific codebase.
301303

302-
**📖 Complete API Documentation**: [View all 11 tools with examples →](docs/api-reference.md)
304+
**📖 Complete API Documentation**: [View all 13 tools with examples →](docs/api-reference.md)
303305

304306
## 🔗 Git Hook Integration
305307

@@ -363,7 +365,7 @@ Comprehensive documentation organized by user journey and expertise level.
363365
| Guide | Purpose | Time Investment |
364366
|-------|---------|-----------------|
365367
| **[Quick Start](#-quick-start)** | Install and run your first server | 2 minutes |
366-
| **[API Reference](docs/api-reference.md)** | Master all 12 MCP tools | 15 minutes |
368+
| **[API Reference](docs/api-reference.md)** | Master all 13 MCP tools | 15 minutes |
367369
| **[HTTP API Reference](docs/http-api.md)** | REST API for web applications | 10 minutes |
368370
| **[Q&A Interface](docs/qa-interface.md)** | AI-powered codebase analysis | 8 minutes |
369371
| **[Git Hook Setup](docs/git-hook-setup.md)** | Automate your workflow | 5 minutes |
@@ -387,7 +389,7 @@ Comprehensive documentation organized by user journey and expertise level.
387389
### 📋 Quick References
388390
- **[Examples & Integrations](examples/)** - Ready-to-use configurations
389391
- **[Troubleshooting](#🚨-troubleshooting)** - Common issues & solutions
390-
- **[API Tools Summary](#🛠️-mcp-tools-available)** - All 11 tools at a glance
392+
- **[API Tools Summary](#🛠️-mcp-tools-available)** - All 13 tools at a glance
391393

392394
**📚 Reading Paths:**
393395
- **New to MCP Code Indexer?** Quick Start → API Reference → HTTP API → Q&A Interface

docs/api-reference.md

Lines changed: 157 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
**Last Updated:** 2025-01-15
55
**Verified Against:** src/mcp_code_indexer/server/mcp_server.py
66
**Test Sources:** tests/integration/test_mcp_tools.py, tests/unit/test_query_preprocessor.py
7-
**Implementation:** All 12 tools verified against actual server code
7+
**Implementation:** All 13 tools verified against actual server code
88
---
99

10-
Complete reference for all 12 MCP tools provided by the Code Indexer server. Whether you're building AI agents or integrating MCP tools directly, this guide shows you exactly how to use each tool effectively.
10+
Complete reference for all 13 MCP tools provided by the Code Indexer server. Whether you're building AI agents or integrating MCP tools directly, this guide shows you exactly how to use each tool effectively.
1111

1212
**🎯 New to MCP Code Indexer?** Start with the [Quick Start Guide](../README.md#-quick-start) to set up your server first.
1313

@@ -27,6 +27,7 @@ Complete reference for all 12 MCP tools provided by the Code Indexer server. Whe
2727
| [`search_codebase_overview`](#search_codebase_overview) | Search overviews | `projectName`, `folderPath`, `searchWord` |
2828
| [`check_database_health`](#check_database_health) | System monitoring | None |
2929
| [`enabled_vector_mode`](#enabled_vector_mode) | Configure vector search | `projectName`, `folderPath`, `enabled` |
30+
| [`find_similar_code`](#find_similar_code) | Find similar code patterns | `projectName`, `folderPath`, code/file input |
3031

3132
**Start here** for new projects
3233
📖 **[See Examples →](../examples/)**
@@ -52,6 +53,7 @@ Complete reference for all 12 MCP tools provided by the Code Indexer server. Whe
5253
- [check_database_health](#check_database_health)
5354
- [Configuration Management](#configuration-management)
5455
- [enabled_vector_mode](#enabled_vector_mode)
56+
- [find_similar_code](#find_similar_code)
5557
- [Common Parameters](#common-parameters)
5658
- [Error Handling](#error-handling)
5759

@@ -941,6 +943,159 @@ try {
941943
}
942944
```
943945

946+
---
947+
948+
### find_similar_code
949+
950+
Find code similar to a given code snippet or file section using vector-based semantic search. This tool uses AI embeddings to understand code context and meaning, providing more intelligent similarity detection than text-based matching.
951+
952+
**⚠️ Vector Mode Required**: This tool only works when vector mode is enabled for the project.
953+
954+
#### Parameters
955+
956+
```typescript
957+
interface FindSimilarCodeParams {
958+
projectName: string; // The name of the project
959+
folderPath: string; // Absolute path to the project folder on disk
960+
961+
// Input source (mutually exclusive)
962+
code_snippet?: string; // Direct code snippet to search for similarities
963+
file_path?: string; // Path to file containing code to analyze
964+
line_start?: number; // Starting line number for file section (1-indexed)
965+
line_end?: number; // Ending line number for file section (1-indexed)
966+
967+
// Search configuration (optional)
968+
similarity_threshold?: number; // Minimum similarity score (0.0-1.0)
969+
max_results?: number; // Maximum number of results to return
970+
}
971+
```
972+
973+
#### Response
974+
975+
```typescript
976+
interface FindSimilarCodeResponse {
977+
results: Array<{
978+
file_path: string; // Path to file containing similar code
979+
code_section: string; // The similar code section found
980+
similarity_score: number; // Similarity score (0.0-1.0, higher is more similar)
981+
start_line: number; // Starting line number of similar section
982+
end_line: number; // Ending line number of similar section
983+
context: string; // Additional context around the match
984+
}>;
985+
search_input: {
986+
type: "snippet" | "file_section"; // Type of input used
987+
content: string; // The code that was searched for
988+
source?: string; // Source file path (if using file_path input)
989+
};
990+
total_results: number; // Total number of similar code sections found
991+
similarity_threshold: number; // Similarity threshold used
992+
}
993+
```
994+
995+
#### Example Usage
996+
997+
##### Search by Code Snippet
998+
999+
```javascript
1000+
const result = await mcp.callTool("find_similar_code", {
1001+
projectName: "my-web-app",
1002+
folderPath: "/home/user/projects/my-web-app",
1003+
code_snippet: `
1004+
function validateEmail(email: string): boolean {
1005+
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
1006+
return emailRegex.test(email);
1007+
}
1008+
`,
1009+
similarity_threshold: 0.7,
1010+
max_results: 5
1011+
});
1012+
1013+
// Response:
1014+
{
1015+
"results": [
1016+
{
1017+
"file_path": "src/utils/validators.ts",
1018+
"code_section": "function isValidEmail(emailAddress: string): boolean {\n const pattern = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/;\n return pattern.test(emailAddress);\n}",
1019+
"similarity_score": 0.92,
1020+
"start_line": 15,
1021+
"end_line": 18,
1022+
"context": "// Email validation utilities"
1023+
},
1024+
{
1025+
"file_path": "src/auth/validation.ts",
1026+
"code_section": "const validateUserEmail = (email: string) => {\n return /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email);\n};",
1027+
"similarity_score": 0.85,
1028+
"start_line": 42,
1029+
"end_line": 44,
1030+
"context": "User input validation functions"
1031+
}
1032+
],
1033+
"search_input": {
1034+
"type": "snippet",
1035+
"content": "function validateEmail(email: string): boolean {\n const emailRegex = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/;\n return emailRegex.test(email);\n}"
1036+
},
1037+
"total_results": 2,
1038+
"similarity_threshold": 0.7
1039+
}
1040+
```
1041+
1042+
##### Search by File Section
1043+
1044+
```javascript
1045+
const result = await mcp.callTool("find_similar_code", {
1046+
projectName: "large-api",
1047+
folderPath: "/home/user/projects/large-api",
1048+
file_path: "src/controllers/userController.ts",
1049+
line_start: 25,
1050+
line_end: 35,
1051+
similarity_threshold: 0.6,
1052+
max_results: 10
1053+
});
1054+
1055+
// Response:
1056+
{
1057+
"results": [
1058+
{
1059+
"file_path": "src/controllers/productController.ts",
1060+
"code_section": "async function createProduct(req: Request, res: Response) {\n try {\n const product = await productService.create(req.body);\n res.status(201).json(product);\n } catch (error) {\n res.status(400).json({ error: error.message });\n }\n}",
1061+
"similarity_score": 0.78,
1062+
"start_line": 18,
1063+
"end_line": 26,
1064+
"context": "Product CRUD operations"
1065+
}
1066+
],
1067+
"search_input": {
1068+
"type": "file_section",
1069+
"content": "async function createUser(req: Request, res: Response) {\n try {\n const user = await userService.create(req.body);\n res.status(201).json(user);\n } catch (error) {\n res.status(400).json({ error: error.message });\n }\n}",
1070+
"source": "src/controllers/userController.ts"
1071+
},
1072+
"total_results": 1,
1073+
"similarity_threshold": 0.6
1074+
}
1075+
```
1076+
1077+
#### 🎯 Use Cases
1078+
1079+
- **Code Duplication Detection**: Find similar functions or code patterns that could be refactored
1080+
- **Code Reuse Discovery**: Locate existing implementations similar to what you're building
1081+
- **Pattern Analysis**: Understand common patterns and approaches across your codebase
1082+
- **Refactoring Opportunities**: Identify code sections that follow similar patterns
1083+
- **Code Review**: Find similar implementations to ensure consistency
1084+
- **Learning**: Discover how similar problems were solved elsewhere in the codebase
1085+
1086+
#### ⚠️ Prerequisites
1087+
1088+
- **Vector Mode Enabled**: Project must have vector mode activated
1089+
- **API Keys Required**: VOYAGE_API_KEY and TURBOPUFFER_API_KEY environment variables
1090+
- **Project Indexed**: The project must be indexed with vector embeddings
1091+
1092+
#### 💡 Tips for Best Results
1093+
1094+
- **Meaningful Code Sections**: Use code sections with clear functionality (10-50 lines work well)
1095+
- **Adjust Similarity Threshold**: Start with 0.7, lower to 0.5 for broader matches
1096+
- **Use Representative Code**: Choose code that represents the pattern you're looking for
1097+
- **Consider Context**: Similar functionality may be implemented differently but serve the same purpose
1098+
9441099
## Common Parameters
9451100

9461101
All tools require these standard parameters for project identification:

docs/http-api.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ interface MCPResponse {
268268

269269
### Available Tools
270270

271-
All 11 MCP tools are available via HTTP. See the [API Reference](api-reference.md) for complete tool documentation.
271+
All 13 MCP tools are available via HTTP. See the [API Reference](api-reference.md) for complete tool documentation.
272272

273273
| Tool Name | Purpose |
274274
|-----------|---------|
@@ -283,6 +283,8 @@ All 11 MCP tools are available via HTTP. See the [API Reference](api-reference.m
283283
| `update_codebase_overview` | Create project docs |
284284
| `search_codebase_overview` | Search overviews |
285285
| `check_database_health` | System monitoring |
286+
| `enabled_vector_mode` | Configure vector search |
287+
| `find_similar_code` | Find similar code patterns |
286288

287289
### Example Tool Calls
288290

@@ -401,6 +403,52 @@ curl -X POST -H "Content-Type: application/json" \
401403
}
402404
```
403405

406+
#### Find Similar Code
407+
408+
```bash
409+
curl -X POST -H "Content-Type: application/json" \
410+
-H "Authorization: Bearer your-token" \
411+
-d '{
412+
"jsonrpc": "2.0",
413+
"method": "tools/call",
414+
"params": {
415+
"name": "find_similar_code",
416+
"arguments": {
417+
"projectName": "my-app",
418+
"folderPath": "/home/user/my-app",
419+
"code_snippet": "function calculateTotal(items) {\n return items.reduce((sum, item) => sum + item.price, 0);\n}",
420+
"similarity_threshold": 0.7,
421+
"max_results": 5
422+
}
423+
}
424+
}' \
425+
http://localhost:7557/mcp
426+
```
427+
428+
```json
429+
{
430+
"jsonrpc": "2.0",
431+
"result": {
432+
"results": [
433+
{
434+
"file_path": "src/utils/calculations.ts",
435+
"code_section": "const sumPrices = (products) => {\n return products.reduce((total, product) => total + product.price, 0);\n};",
436+
"similarity_score": 0.92,
437+
"start_line": 15,
438+
"end_line": 17,
439+
"context": "Price calculation utilities"
440+
}
441+
],
442+
"search_input": {
443+
"type": "snippet",
444+
"content": "function calculateTotal(items) {\n return items.reduce((sum, item) => sum + item.price, 0);\n}"
445+
},
446+
"total_results": 1,
447+
"similarity_threshold": 0.7
448+
}
449+
}
450+
```
451+
404452
## Server-Sent Events
405453

406454
The HTTP API supports Server-Sent Events (SSE) for real-time streaming of tool responses and system events.

0 commit comments

Comments
 (0)