This document provides a deep technical dive into the TraceMind MCP Server architecture, implementation details, and deployment configuration.
- System Overview
- Project Structure
- Core Components
- MCP Protocol Implementation
- Gemini Integration
- Data Flow
- Deployment Architecture
- Development Workflow
- Performance Considerations
- Security
TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides AI-powered analysis tools for agent evaluation data. It serves as the backend intelligence layer for the TraceMind ecosystem.
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Framework | Gradio | 6.x | Native MCP support with @gr.mcp.* decorators |
| AI Model | Google Gemini | 2.5 Flash Lite | AI-powered analysis and insights |
| Data Source | HuggingFace Datasets | Latest | Load evaluation datasets |
| Protocol | MCP | 1.0 | Model Context Protocol for tool exposure |
| Transport | SSE | - | Server-Sent Events for real-time communication |
| Deployment | Docker | - | HuggingFace Spaces containerized deployment |
| Language | Python | 3.10+ | Core implementation |
┌──────────────────────────────────────────────────────────────┐
│ MCP Clients (External) │
│ - Claude Desktop │
│ - VS Code (Continue, Cursor, Cline) │
│ - TraceMind-AI (Track 2) │
└────────────────┬─────────────────────────────────────────────┘
│
│ MCP Protocol
│ (SSE Transport)
↓
┌──────────────────────────────────────────────────────────────┐
│ TraceMind MCP Server (HuggingFace Spaces) │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Gradio App (app.py) │ │
│ │ - MCP Server Endpoint (mcp_server=True) │ │
│ │ - Testing UI (Gradio Blocks) │ │
│ │ - Configuration Management │ │
│ └─────────────┬────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ MCP Tools (mcp_tools.py) │ │
│ │ - 11 Tools (@gr.mcp.tool()) │ │
│ │ - 3 Resources (@gr.mcp.resource()) │ │
│ │ - 3 Prompts (@gr.mcp.prompt()) │ │
│ └─────────────┬────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Gemini Client (gemini_client.py) │ │
│ │ - API Authentication │ │
│ │ - Prompt Engineering │ │
│ │ - Response Parsing │ │
│ └─────────────┬────────────────────────────────────────┘ │
│ │ │
└────────────────┼──────────────────────────────────────────────┘
│
↓
┌────────────────┐
│ External APIs │
│ - Gemini API │
│ - HF Datasets │
└────────────────┘
TraceMind-mcp-server/
├── app.py # Main entry point, Gradio UI
├── mcp_tools.py # MCP tool implementations (11 tools + 3 resources + 3 prompts)
├── gemini_client.py # Google Gemini API client
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── .env.example # Environment variable template
├── .gitignore # Git ignore rules
├── README.md # Project documentation
└── DOCUMENTATION.md # Complete API reference
Total: 8 files (excluding docs)
Lines of Code: ~3,500 lines (breakdown below)
| File | Lines | Purpose |
|---|---|---|
app.py |
~1,200 | Gradio UI + MCP server setup + testing interface |
mcp_tools.py |
~2,100 | All 17 MCP components (tools, resources, prompts) |
gemini_client.py |
~200 | Gemini API integration |
requirements.txt |
~20 | Dependencies |
Dockerfile |
~30 | Deployment configuration |
Purpose: Entry point for HuggingFace Spaces deployment, provides both MCP server and testing UI.
Key Responsibilities:
- Initialize Gradio app with
mcp_server=True - Create testing interface for all MCP tools
- Handle configuration (API keys, settings)
- Manage client connections
Architecture:
# app.py structure
import gradio as gr
from gemini_client import GeminiClient
from mcp_tools import * # All tool implementations
# 1. Initialize Gemini client (with fallback)
default_gemini_client = GeminiClient()
# 2. Create Gradio UI for testing
def create_gradio_ui():
with gr.Blocks() as demo:
# Settings tab for API key configuration
# Tab for each MCP tool (11 tabs)
# Tab for testing resources
# Tab for testing prompts
# API documentation tab
return demo
# 3. Launch with MCP server enabled
if __name__ == "__main__":
demo = create_gradio_ui()
demo.launch(
mcp_server=True, # ← Enables MCP endpoint
share=False,
server_name="0.0.0.0",
server_port=7860
)MCP Enablement:
mcp_server=Trueindemo.launch()automatically:- Exposes
/gradio_api/mcp/sseendpoint - Discovers all
@gr.mcp.tool(),@gr.mcp.resource(),@gr.mcp.prompt()decorated functions - Generates MCP tool schemas from function signatures and docstrings
- Handles MCP protocol communication (SSE transport)
- Exposes
Testing Interface:
- Settings Tab: Configure Gemini API key and HF token
- Tool Tabs (11): One tab per tool for manual testing
- Input fields for all parameters
- Submit button
- Output display (Markdown or JSON)
- Resources Tab: Test resource URIs
- Prompts Tab: Test prompt templates
- API Documentation Tab: Generated from tool docstrings
Purpose: Implements all 17 MCP components (11 tools + 3 resources + 3 prompts).
Structure:
# mcp_tools.py structure
import gradio as gr
from gemini_client import GeminiClient
from datasets import load_dataset
# ============ TOOLS (11) ============
@gr.mcp.tool()
async def analyze_leaderboard(...) -> str:
"""Tool docstring (becomes MCP description)"""
# 1. Load data from HuggingFace
# 2. Process/filter data
# 3. Call Gemini for AI analysis
# 4. Return formatted response
pass
@gr.mcp.tool()
async def debug_trace(...) -> str:
"""Debug traces with AI assistance"""
pass
# ... (9 more tools)
# ============ RESOURCES (3) ============
@gr.mcp.resource()
def get_leaderboard_data(uri: str) -> str:
"""URI: leaderboard://{repo}"""
# Parse URI
# Load dataset
# Return raw JSON
pass
@gr.mcp.resource()
def get_trace_data(uri: str) -> str:
"""URI: trace://{trace_id}/{repo}"""
pass
@gr.mcp.resource()
def get_cost_data(uri: str) -> str:
"""URI: cost://model/{model_name}"""
pass
# ============ PROMPTS (3) ============
@gr.mcp.prompt()
def analysis_prompt(analysis_type: str, ...) -> str:
"""Generate analysis prompt templates"""
pass
@gr.mcp.prompt()
def debug_prompt(debug_type: str, ...) -> str:
"""Generate debug prompt templates"""
pass
@gr.mcp.prompt()
def optimization_prompt(optimization_goal: str, ...) -> str:
"""Generate optimization prompt templates"""
passDesign Patterns:
-
Decorator-Based Registration:
@gr.mcp.tool() # Gradio automatically registers as MCP tool async def tool_name(...) -> str: """Docstring becomes tool description in MCP schema""" pass
-
Structured Docstrings:
""" Brief one-line description. Longer detailed description explaining purpose and behavior. Args: param1 (type): Description of param1 param2 (type): Description of param2. Default: value Returns: type: Description of return value """
Gradio parses this to generate MCP tool schema automatically.
-
Error Handling:
try: # Tool implementation return result except Exception as e: return f"❌ **Error**: {str(e)}"
All errors returned as user-friendly strings.
-
Async/Await: All tools are
asyncfor efficient I/O operations (API calls, dataset loading).
Purpose: Handles all interactions with Google Gemini 2.5 Flash Lite API.
Key Features:
- API authentication
- Prompt engineering for different analysis types
- Response parsing and formatting
- Error handling and retries
- Token optimization
Class Structure:
class GeminiClient:
def __init__(self, api_key: str, model_name: str):
"""Initialize with API key and model"""
self.api_key = api_key
self.model = genai.GenerativeModel(model_name)
self.generation_config = {
"temperature": 0.7,
"top_p": 0.95,
"max_output_tokens": 4096, # Optimized for HF Spaces
}
self.request_timeout = 30 # 30s timeout
async def analyze_with_context(
self,
data: Dict,
analysis_type: str,
specific_question: Optional[str] = None
) -> str:
"""
Core analysis method used by all AI-powered tools
Args:
data: Data to analyze (dict or JSON)
analysis_type: "leaderboard", "trace", "cost_estimate", "comparison", "results"
specific_question: Optional specific question
Returns:
Markdown-formatted analysis
"""
# 1. Build system prompt based on analysis_type
system_prompt = self._get_system_prompt(analysis_type)
# 2. Format data for context
data_str = json.dumps(data, indent=2)
# 3. Build user prompt
user_prompt = f"{system_prompt}\n\nData:\n{data_str}"
if specific_question:
user_prompt += f"\n\nSpecific Question: {specific_question}"
# 4. Call Gemini API
response = await self.model.generate_content_async(
user_prompt,
generation_config=self.generation_config,
request_options={"timeout": self.request_timeout}
)
# 5. Extract and return text
return response.text
def _get_system_prompt(self, analysis_type: str) -> str:
"""Get specialized system prompt for each analysis type"""
prompts = {
"leaderboard": """You are an expert AI agent performance analyst.
Analyze evaluation leaderboard data and provide:
- Top performers by key metrics
- Trade-off analysis (cost vs accuracy)
- Trend identification
- Actionable recommendations
Format: Markdown with clear sections.""",
"trace": """You are an expert at debugging AI agent executions.
Analyze OpenTelemetry trace data and:
- Answer specific questions about execution
- Identify performance bottlenecks
- Explain reasoning chain
- Provide optimization suggestions
Format: Clear, concise explanation.""",
"cost_estimate": """You are a cost optimization expert.
Analyze cost estimation data and provide:
- Detailed cost breakdown
- Hardware recommendations
- Cost optimization opportunities
- ROI analysis
Format: Structured breakdown with recommendations.""",
# ... more prompts for other analysis types
}
return prompts.get(analysis_type, prompts["leaderboard"])Optimization Strategies:
- Token Reduction:
max_output_tokens: 4096(reduced from 8192) for faster responses - Request Timeout: 30s timeout for HF Spaces compatibility
- Temperature: 0.7 for balanced creativity and consistency
- Model Selection:
gemini-2.5-flash-litefor speed (can switch togemini-2.5-flashfor quality)
Gradio 6+ provides native MCP server capabilities through decorators and automatic schema generation.
1. Tool Registration:
@gr.mcp.tool() # ← This decorator tells Gradio to expose this as an MCP tool
async def my_tool(param1: str, param2: int = 10) -> str:
"""
Brief description (used in MCP tool schema).
Args:
param1 (str): Description of param1
param2 (int): Description of param2. Default: 10
Returns:
str: Description of return value
"""
return f"Result: {param1}, {param2}"What Gradio does automatically:
- Parses function signature to extract parameter names and types
- Parses docstring to extract descriptions
- Generates MCP tool schema:
{ "name": "my_tool", "description": "Brief description (used in MCP tool schema).", "inputSchema": { "type": "object", "properties": { "param1": { "type": "string", "description": "Description of param1" }, "param2": { "type": "integer", "default": 10, "description": "Description of param2. Default: 10" } }, "required": ["param1"] } }
2. Resource Registration:
@gr.mcp.resource()
def get_resource(uri: str) -> str:
"""
Resource description.
Args:
uri (str): Resource URI (e.g., "leaderboard://repo/name")
Returns:
str: JSON data
"""
# Parse URI
# Load data
# Return JSON string
pass3. Prompt Registration:
@gr.mcp.prompt()
def generate_prompt(prompt_type: str, context: str) -> str:
"""
Generate reusable prompt templates.
Args:
prompt_type (str): Type of prompt
context (str): Context for prompt generation
Returns:
str: Generated prompt text
"""
return f"Prompt template for {prompt_type} with {context}"When demo.launch(mcp_server=True) is called:
SSE Endpoint (Primary):
https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
Streamable HTTP Endpoint (Alternative):
https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"tracemind": {
"url": "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
"transport": "sse"
}
}
}Python MCP Client:
from mcp import ClientSession, ServerParameters
session = ClientSession(
ServerParameters(
url="https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
transport="sse"
)
)
await session.__aenter__()
# List tools
tools = await session.list_tools()
# Call tool
result = await session.call_tool("analyze_leaderboard", arguments={
"metric_focus": "cost",
"top_n": 5
})Environment Variable:
GEMINI_API_KEY=your_api_key_hereInitialization:
import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.5-flash-lite")1. System Prompts by Analysis Type: Each analysis type (leaderboard, trace, cost, comparison, results) has a specialized system prompt that:
- Defines the AI's role and expertise
- Specifies output format (markdown, structured sections)
- Lists key insights to include
- Sets tone (professional, concise, actionable)
2. Context Injection:
user_prompt = f"""
{system_prompt}
Data to Analyze:
{json.dumps(data, indent=2)}
Specific Question: {question}
"""3. Output Formatting:
- All responses in Markdown
- Clear sections: Top Performers, Key Insights, Trade-offs, Recommendations
- Bullet points for readability
- Code blocks for technical details
Rate Limits (Gemini 2.5 Flash Lite free tier):
- 1,500 requests per day
- 1 request per second
Error Handling Strategy:
try:
response = await model.generate_content_async(...)
return response.text
except google.api_core.exceptions.ResourceExhausted:
return "❌ **Rate limit exceeded**. Please try again in a few seconds."
except google.api_core.exceptions.DeadlineExceeded:
return "❌ **Request timeout**. The analysis is taking too long. Try with less data."
except Exception as e:
return f"❌ **Error**: {str(e)}"1. MCP Client (e.g., Claude Desktop, TraceMind-AI)
└─→ Calls: analyze_leaderboard(metric_focus="cost", top_n=5)
2. Gradio MCP Server (app.py)
└─→ Routes to: analyze_leaderboard() in mcp_tools.py
3. MCP Tool Function (mcp_tools.py)
├─→ Load data from HuggingFace Datasets
│ └─→ ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
│
├─→ Process/filter data
│ └─→ Filter by time range, sort by metric
│
├─→ Call Gemini Client
│ └─→ gemini_client.analyze_with_context(data, "leaderboard")
│
└─→ Return formatted response
4. Gemini Client (gemini_client.py)
├─→ Build system prompt
├─→ Format data as JSON
├─→ Call Gemini API
│ └─→ model.generate_content_async(prompt)
└─→ Return AI-generated analysis
5. Response Path (back through stack)
└─→ Gemini → gemini_client → mcp_tool → Gradio → MCP Client
6. MCP Client (displays result to user)
└─→ Shows markdown-formatted analysis
1. MCP Client
└─→ Accesses: leaderboard://kshitijthakkar/smoltrace-leaderboard
2. Gradio MCP Server
└─→ Routes to: get_leaderboard_data(uri)
3. Resource Function
├─→ Parse URI to extract repo name
├─→ Load dataset from HuggingFace
├─→ Convert to JSON
└─→ Return raw JSON string
4. MCP Client
└─→ Receives raw JSON data (no AI processing)
Platform: HuggingFace Spaces SDK: Docker (for custom dependencies) Hardware: CPU Basic (free tier) - sufficient for API calls and dataset loading URL: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
# Base image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy requirements
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application files
COPY app.py .
COPY mcp_tools.py .
COPY gemini_client.py .
# Expose port
EXPOSE 7860
# Set environment variables
ENV GRADIO_SERVER_NAME="0.0.0.0"
ENV GRADIO_SERVER_PORT="7860"
# Run application
CMD ["python", "app.py"]# Required
GEMINI_API_KEY=your_gemini_api_key_here
# Optional (for testing)
HF_TOKEN=your_huggingface_token_hereCurrent Setup (Free Tier):
- Hardware: CPU Basic
- Concurrent Users: ~10-20
- Request Latency: 2-5 seconds (AI analysis)
- Rate Limit: Gemini API (1,500 req/day)
If Scaling Needed:
- Upgrade Hardware: CPU Basic → CPU Upgrade (2x performance)
- Caching: Add Redis for caching frequent queries
- API Key Pool: Rotate multiple Gemini API keys to bypass rate limits
- Load Balancing: Deploy multiple Spaces instances with load balancer
# 1. Clone repository
git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git
cd TraceMind-mcp-server
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys
# 5. Run locally
python app.py
# 6. Access
# - Gradio UI: http://localhost:7860
# - MCP Endpoint: http://localhost:7860/gradio_api/mcp/sseOption 1: Gradio UI (Easiest):
1. Run app.py
2. Open http://localhost:7860
3. Navigate to tool tab (e.g., "📊 Analyze Leaderboard")
4. Fill in parameters
5. Click submit button
6. View results
Option 2: Python MCP Client:
from mcp import ClientSession, ServerParameters
async def test_tool():
session = ClientSession(
ServerParameters(
url="http://localhost:7860/gradio_api/mcp/sse",
transport="sse"
)
)
await session.__aenter__()
result = await session.call_tool("analyze_leaderboard", {
"metric_focus": "cost",
"top_n": 3
})
print(result.content[0].text)
import asyncio
asyncio.run(test_tool())Step 1: Add function to mcp_tools.py:
@gr.mcp.tool()
async def new_tool_name(
param1: str,
param2: int = 10
) -> str:
"""
Brief description of what this tool does.
Detailed explanation of the tool's purpose and behavior.
Args:
param1 (str): Description of param1 with examples
param2 (int): Description of param2. Default: 10
Returns:
str: Description of what the function returns
"""
try:
# Implementation
result = f"Processed: {param1} with {param2}"
return result
except Exception as e:
return f"❌ **Error**: {str(e)}"Step 2: Add testing UI to app.py (optional):
with gr.Tab("🆕 New Tool"):
gr.Markdown("## New Tool Name")
param1_input = gr.Textbox(label="Param 1")
param2_input = gr.Number(label="Param 2", value=10)
submit_btn = gr.Button("Execute")
output = gr.Markdown()
submit_btn.click(
fn=new_tool_name,
inputs=[param1_input, param2_input],
outputs=output
)Step 3: Test:
python app.py
# Visit http://localhost:7860
# Test in new tabStep 4: Deploy:
git add mcp_tools.py app.py
git commit -m "feat: Add new_tool_name MCP tool"
git push origin main
# HF Spaces auto-deploysProblem: Loading full datasets consumes excessive tokens in AI analysis.
Solutions:
- get_top_performers: Returns only top N models (90% token reduction)
- get_leaderboard_summary: Returns aggregated stats (99% token reduction)
- Data sampling: Limit rows when loading datasets (max_rows parameter)
Example:
# ❌ BAD: Loads 51 rows, ~50K tokens
full_data = load_dataset("kshitijthakkar/smoltrace-leaderboard")
# ✅ GOOD: Returns top 5, ~5K tokens (90% reduction)
top_5 = await get_top_performers(top_n=5)
# ✅ BETTER: Returns summary, ~500 tokens (99% reduction)
summary = await get_leaderboard_summary()All tools are async for efficient I/O:
@gr.mcp.tool()
async def tool_name(...): # ← async
ds = load_dataset(...) # ← Blocks on I/O
result = await gemini_client.analyze(...) # ← async API call
return resultBenefits:
- Non-blocking API calls
- Multiple concurrent requests
- Better resource utilization
Current: No caching (stateless) Future: Add Redis for caching frequent queries
import redis
from functools import wraps
redis_client = redis.Redis(...)
def cache_result(ttl=300):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Generate cache key
cache_key = f"{func.__name__}:{hash((args, tuple(kwargs.items())))}"
# Check cache
cached = redis_client.get(cache_key)
if cached:
return cached.decode()
# Execute function
result = await func(*args, **kwargs)
# Store in cache
redis_client.setex(cache_key, ttl, result)
return result
return wrapper
return decorator
@gr.mcp.tool()
@cache_result(ttl=300) # 5-minute cache
async def analyze_leaderboard(...):
passStorage:
- Development:
.envfile (gitignored) - Production: HuggingFace Spaces Secrets (encrypted)
Access:
# gemini_client.py
api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
raise ValueError("GEMINI_API_KEY not set")Never:
- ❌ Hardcode API keys in source code
- ❌ Commit
.envto git - ❌ Expose keys in client-side JavaScript
- ❌ Log API keys in console/files
Dataset Repository Validation:
# Only allow "smoltrace-" prefix datasets
if "smoltrace-" not in dataset_repo:
return "❌ Error: Dataset must contain 'smoltrace-' prefix for security"Parameter Validation:
# Constrain ranges
top_n = max(1, min(20, top_n)) # Clamp between 1-20
max_rows = max(10, min(500, max_rows)) # Clamp between 10-500Gemini API:
- Free tier: 1,500 requests/day
- Handled by Google (automatic)
- Errors returned as user-friendly messages
HuggingFace Datasets:
- No rate limits for public datasets
- Private datasets require HF token
- README.md - Overview and quick start
- DOCUMENTATION.md - Complete API reference
- TraceMind-AI Architecture - Client-side architecture
Last Updated: November 21, 2025 Version: 1.0.0 Track: Building MCP (Enterprise)