The-AI-Alliance · jasonchan510ca · Nov 7, 2025 · Nov 7, 2025
diff --git a/rag-local-milvus-ollama/README.md b/rag-local-milvus-ollama/README.md
@@ -10,7 +10,7 @@ This setup runs every thing locally.  No API keys required.
 | [Ollama](https://ollama.com/)            | LLM runtime   |
 
 
-## Step-1: Clone this repo
+## Step-1: Get the code
 
 ```bash
 # Substitute appropriate repo URL
@@ -20,9 +20,9 @@ cd    allycat/rag-local-milvus-ollama
 
 ---
 
-## Step-2: Setup Python Dev Env
+## Step-2: Setup Env
 
-[Setup python dev env](setup.md)
+Follow the [Setup](setup.md) guide.
 
 
 **And activate your python env**
@@ -40,7 +40,7 @@ conda  activate  allycat-1  # what ever the name of the env
 
 ---
 
-## Step-3: Setup `.env` file
+## Step-3: (Optional) Setup `.env` file
 
 **This step is optional**.  Allycat runs fine with default configuration options.  You can customize them to fit your needs.
 
@@ -173,23 +173,30 @@ Go to url : http://localhost:8090  and start chatting!
 
 ---
 
-## Packaging the app to deploy
+## Step-10: Turn the RAG into MCP 
+
+[MCP server code](mcp_server.py)  
+[MCP client code](mcp_client.py)
+
+Read more on [rag --> mcp](mcp.md)
+
+## Step-11: Packaging the app to deploy
 
 We will create a docker image of the app.  It will package up the code + data
 
 Note:  Be sure to run the docker command from the root of the project.
 
 ```bash
-docker  build    -t allycat  .
+docker  build    -t allycat-local  .
 ```
 
-## Run the AllyCat Docker
+## Step-12: Run the AllyCat Docker
 
 Let's start the docker in 'dev' mode
 
 ```bash
-docker run -it --rm -p 8090:8090  -p 8080:8080  allycat  deploy
-# docker run -it --rm -p 8090:8090  -v allycat-vol1:/allycat/workspace  sujee/allycat
+docker run -it --rm -p 8090:8090  -p 8080:8080  allycat-local  deploy
+# docker run -it --rm -p 8090:8090  -v allycat-vol1:/allycat/workspace  allycat-local
 ```
 
 `deploy` option starts ollama server and web UI.
@@ -201,6 +208,8 @@ docker run -it --rm -p 8090:8090  -p 8080:8080  allycat  deploy
 
 ### Creating `requirements.txt` using uv
 
+When we update uv deps, run this command to update requirements.txt
+
 ```bash
 uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt
 ```
diff --git a/rag-local-milvus-ollama/mcp.md b/rag-local-milvus-ollama/mcp.md
@@ -0,0 +1,222 @@
+# RAG Local MCP Server Integration
+
+A Model Context Protocol (MCP) server that provides access to the local RAG (Retrieval-Augmented Generation) system functionality using Milvus and Ollama.
+
+## Overview
+
+This MCP server exposes the local RAG query functionality through the MCP protocol, allowing Claude and other MCP clients to query your local vector database.
+
+## Features
+
+- **allycat_local_query tool**: Query the local RAG system with natural language questions
+- Uses the same configuration as the local RAG system (from `my_config.py`)
+- Connects to the local Milvus vector database
+- Uses local Ollama models for embeddings and LLM
+- Provides processing time and metadata in responses
+
+## Prerequisites
+
+1. **Python Environment**: Ensure you have Python 3.8+ installed
+2. **Dependencies**: Install the required packages:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. **Milvus Database**: Ensure your local Milvus database is populated with data
+4. **Ollama**: Ensure Ollama is installed and running locally
+5. **Models**: Ensure the required embedding and LLM models are downloaded
+
+## Configuration
+
+The server uses the same configuration as the main system from `my_config.py`:
+- Embedding model and settings
+- Local vector database connection
+- Ollama LLM model configuration
+- Collection name and other parameters
+
+Make sure your `.env` file is properly configured with the correct model names and paths.
+
+## VSCode Integration
+
+### Method 1: Using VSCode Settings (Recommended)
+
+1. Open VSCode Settings (Ctrl/Cmd + ,)
+2. Search for "MCP" or navigate to Extensions → MCP
+3. Add a new MCP server configuration with the following settings:
+
+```json
+{
+  "mcp.servers": {
+    "allycat-local-rag": {
+      "command": "python",
+      "args": ["${workspaceFolder}/rag-local-milvus-ollama/mcp_server.py"],
+      "cwd": "${workspaceFolder}/rag-local-milvus-ollama"
+    }
+  }
+}
+```
+
+### Method 2: Using Workspace Settings
+
+1. Open your workspace settings file (`.vscode/settings.json`)
+2. Add the MCP server configuration:
+
+```json
+{
+  "mcp.servers": {
+    "allycat-local-rag": {
+      "command": "python",
+      "args": ["./rag-local-milvus-ollama/mcp_server.py"],
+      "cwd": "./rag-local-milvus-ollama"
+    }
+  }
+}
+```
+
+### Method 3: Direct Command Line
+
+You can also run the server directly from the terminal:
+
+```bash
+cd rag-local-milvus-ollama
+python mcp_server.py
+```
+
+## Usage in VSCode
+
+Once configured, the MCP server will be available in VSCode:
+
+1. Open a chat window with an MCP-enabled extension (like Claude Chat)
+2. The server should automatically connect
+3. You can use the `allycat_local_query` tool to ask questions:
+
+Example queries:
+- "What is AI Alliance?"
+- "What are the main focus areas of AI Alliance?"
+- "What are some AI alliance projects?"
+- "What are the upcoming events?"
+- "How do I join the AI Alliance?"
+
+## Available Tools
+
+- **allycat_local_query**: Query the local RAG system
+  - Input: `query` (string) - The question to ask
+  - Output: Response from the RAG system with metadata and processing time
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Server not connecting**: 
+   - Ensure all dependencies are installed
+   - Check that Milvus database is accessible
+   - Verify Ollama is running and models are downloaded
+
+2. **Permission errors**:
+   - Ensure the Python script has execute permissions
+   - Check that the working directory is correct
+
+3. **Model loading issues**:
+   - Verify model names in `my_config.py`
+   - Ensure models are downloaded via Ollama
+
+### Logs
+
+The server outputs logs to help debug issues. Look for:
+- ✅ Success messages for initialization
+- ❌ Error messages for failures
+- Processing time and metadata in responses
+
+## Development
+
+To test the server locally:
+
+```bash
+cd rag-local-milvus-ollama
+python mcp_server.py
+```
+
+The server will start and listen for MCP connections on stdin/stdout.
+
+## Testing the MCP Server
+
+### Interactive Testing
+
+Run the interactive test client that takes user input and sends it to a running MCP server:
+
+First, start the MCP server in one terminal:
+```bash
+cd rag-local-milvus-ollama
+python mcp_server.py
+```
+
+Then in another terminal, run the interactive client:
+```bash
+cd rag-local-milvus-ollama
+python mcp_test.py
+```
+
+Or use pipe redirection for a single-terminal experience:
+```bash
+cd rag-local-milvus-ollama
+python mcp_test.py | python mcp_server.py
+```
+
+This will start an interactive session where you can:
+- Type questions to query the RAG system
+- See properly formatted responses from the MCP server
+- Type `/quit` to exit the session
+
+The client handles:
+- Initializing the MCP connection with proper JSON-RPC format
+- Sending your queries to the `allycat_local_query` tool
+- Graceful session management
+
+### Manual JSON-RPC Testing
+
+If you prefer manual testing with JSON requests:
+
+### Manual Testing
+
+1. **Start the server:**
+   ```bash
+   cd rag-local-milvus-ollama
+   python mcp_server.py
+   ```
+
+2. **Send initialization request:**
+   ```json
+   {"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {"capabilities": {}, "clientInfo": {"name": "test", "version": "1.0.0"}}}
+   ```
+
+3. **List available tools:**
+   ```json
+   {"jsonrpc": "2.0", "id": 2, "method": "tools/list"}
+   ```
+
+4. **Call the query tool:**
+   ```json
+   {"jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {"name": "allycat_local_query", "arguments": {"query": "What is this website about?"}}}
+   ```
+
+**Note:** For manual testing, you'll need to send each JSON request as a single line followed by pressing Enter. The server responds with JSON-RPC formatted responses.
+
+### Expected Responses
+
+- **Initialization**: Should return server capabilities and confirmation
+- **Tool listing**: Should show `allycat_local_query` tool with its schema
+- **Tool call**: Should return the RAG query response with metadata and timing info
+
+Press Ctrl+C to exit the server when testing manually.
+
+## Dependencies
+
+The server requires the same dependencies as the main RAG system:
+- llama-index with HuggingFace embeddings
+- Milvus vector store
+- LiteLLM for Ollama integration
+- python-dotenv
+- mcp-server package
+
+Install with:
+```bash
+pip install -r requirements.txt