evalops
diff --git a/‎SECURITY.md‎
Lines changed: 189 additions & 0 deletions b/‎SECURITY.md‎
Lines changed: 189 additions & 0 deletions
diff --git a/‎src/analyzers/DeepCodeReasonerV2.ts‎
Lines changed: 3 additions & 3 deletions b/‎src/analyzers/DeepCodeReasonerV2.ts‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎src/index.ts‎
Lines changed: 62 additions & 15 deletions b/‎src/index.ts‎
Lines changed: 62 additions & 15 deletions
@@ -0,0 +1,189 @@
+# Security Analysis and Fixes for Deep Code Reasoning MCP
+
+## Executive Summary
+
+This document details critical security vulnerabilities discovered in the Deep Code Reasoning MCP server and the comprehensive fixes implemented to address them. The analysis was conducted using a collaborative approach between Claude and the Gemini-powered deep reasoning service, demonstrating the very capabilities this tool provides.
+
+## Vulnerabilities Discovered
+
+### 1. Critical: Path Traversal (Arbitrary File Read)
+
+**Severity**: Critical  
+**Location**: `src/utils/CodeReader.ts:46`
+
+**Description**: The `CodeReader` class performs no validation on file paths, allowing attackers to read any file on the host system that the process has access to.
+
+```typescript
+// VULNERABLE CODE
+const content = await fs.readFile(filePath, 'utf-8');
+```
+
+**Attack Vector**: An attacker can provide paths like `../../../../etc/passwd` through the `code_scope.files` array, which gets passed directly to the file system API.
+
+**Fix**: Implemented `SecureCodeReader` with:
+- Strict path validation against a project root directory
+- Resolution of all paths to absolute form
+- Verification that resolved paths remain within project boundaries
+- File type restrictions (allowed extensions only)
+- File size limits (10MB max)
+
+### 2. High: Prompt Injection via Untrusted Context
+
+**Severity**: High  
+**Locations**: 
+- `src/services/GeminiService.ts:64-75`
+- `src/services/ConversationalGeminiService.ts:193`
+
+**Description**: User-controlled data flows directly into LLM prompts without sanitization, allowing prompt injection attacks.
+
+```typescript
+// VULNERABLE CODE
+`- Attempted approaches: ${context.attemptedApproaches.join(', ')}`
+`- Stuck points: ${context.stuckPoints.join(', ')}`
+`- Partial findings: ${JSON.stringify(context.partialFindings)}`
+```
+
+**Attack Vectors**:
+- Direct injection through `attemptedApproaches` and `stuckPoints` arrays
+- JSON structure injection through `partialFindings`
+- Second-order injection where initial Claude analysis extracts malicious instructions from code comments
+
+**Fix**: Implemented `PromptSanitizer` with:
+- Detection of common injection patterns
+- Clear delimitation of trusted vs untrusted data
+- Wrapping all user data in XML-style tags
+- Explicit security notices in system prompts
+
+### 3. High: Filename Injection
+
+**Severity**: High  
+**Location**: `src/services/GeminiService.ts:75`
+
+**Description**: Malicious filenames can inject instructions into prompts.
+
+```typescript
+// VULNERABLE CODE
+prompt += `\n--- File: ${file} ---\n${content}\n`;
+```
+
+**Attack Example**: A file named `auth.ts --- IGNORE ALL PREVIOUS INSTRUCTIONS ---` would break out of the file content context.
+
+**Fix**: 
+- Filename sanitization removing control characters
+- Validation against safe character set
+- Length limits (255 chars max)
+
+### 4. Medium: Conversational State Poisoning
+
+**Severity**: Medium  
+**Location**: `src/services/ConversationalGeminiService.ts:47-64`
+
+**Description**: Chat history accumulates without safeguards, allowing gradual instruction injection over multiple conversation turns.
+
+**Attack Scenario**: 
+1. Attacker establishes seemingly innocent rules in early conversation turns
+2. These rules get incorporated into the chat history
+3. Later turns can leverage these established rules for malicious purposes
+
+**Fix**:
+- Message sanitization for each conversation turn
+- Detection and logging of injection attempts
+- Clear labeling of Claude messages vs system instructions
+- Security reminders in each turn
+
+## Analysis Process
+
+The security analysis followed this methodology:
+
+### 1. Initial Pattern Search
+- Searched for prompt construction patterns using grep
+- Identified all locations where user input meets LLM prompts
+- Found direct string concatenation without sanitization
+
+### 2. Deep Reasoning Analysis
+Using the deep-code reasoning server itself, we:
+- Traced data flow from user input to prompt construction
+- Identified the path from MCP tool calls to internal data structures
+- Discovered the complete attack chain for path traversal
+
+### 3. Collaborative Investigation
+The analysis leveraged conversational AI to:
+- Formulate and test security hypotheses
+- Identify subtle attack vectors (like second-order injection)
+- Validate findings with evidence from the codebase
+
+### Key Insights from the Analysis:
+1. **Implicit Trust Boundary Violation**: The system treated `ClaudeCodeContext` as trusted internal state despite it originating from user-controlled tool calls
+2. **Missing Input Validation Layer**: No validation occurred between receiving MCP arguments and using them in security-sensitive operations
+3. **Prompt Construction Anti-Pattern**: Using string concatenation for prompts inherently mixes instructions with data
+
+## Implementation Details
+
+### SecureCodeReader
+- Enforces project root boundaries
+- Validates file extensions
+- Implements size limits
+- Provides clear error messages for security violations
+
+### PromptSanitizer
+- Detects injection patterns with regex
+- Provides safe formatting methods
+- Creates structured prompts with clear data boundaries
+- Handles various data types safely
+
+### InputValidator
+- Uses Zod schemas for type safety
+- Enforces length and format constraints
+- Validates file paths against traversal attempts
+- Provides sanitized output ready for use
+
+## Testing Recommendations
+
+1. **Path Traversal Tests**:
+   - Attempt to read `/etc/passwd`
+   - Try various path traversal patterns (`../`, `..\\`, encoded variants)
+   - Test symlink traversal attempts
+
+2. **Prompt Injection Tests**:
+   - Include "ignore all previous instructions" in various fields
+   - Test JSON injection through `partialFindings`
+   - Attempt conversational hijacking
+
+3. **Edge Cases**:
+   - Very long filenames
+   - Unicode in filenames
+   - Deeply nested object structures
+
+## Deployment Considerations
+
+1. **Breaking Changes**: 
+   - File paths are now validated strictly
+   - Some previously accepted characters in strings are now rejected
+   - Error messages have changed
+
+2. **Performance Impact**:
+   - Minimal overhead from validation
+   - Slight increase in prompt size due to safety delimiters
+   - Caching remains effective
+
+3. **Monitoring**:
+   - Log injection attempts for security monitoring
+   - Track validation failures
+   - Monitor for unusual file access patterns
+
+## Future Improvements
+
+1. **Rate Limiting**: Implement rate limits to prevent abuse
+2. **Audit Logging**: Comprehensive logging of all file access and prompts
+3. **Sandboxing**: Consider running in a sandboxed environment
+4. **Dynamic Analysis**: Runtime monitoring of LLM responses for anomalies
+
+## Credits
+
+This security analysis was performed through a unique collaboration:
+- Initial vulnerability discovery by Claude (Anthropic)
+- Deep semantic analysis by Gemini (Google)
+- Collaborative investigation using the conversational analysis features
+- Implementation and documentation by the development team
+
+The analysis demonstrates the power of using AI systems to analyze and improve AI systems, creating a virtuous cycle of security improvements.
@@ -6,21 +6,21 @@ import type {
 import { GeminiService } from '../services/GeminiService.js';
 import { ConversationalGeminiService } from '../services/ConversationalGeminiService.js';
 import { ConversationManager } from '../services/ConversationManager.js';
-import { CodeReader } from '../utils/CodeReader.js';
+import { SecureCodeReader } from '../utils/SecureCodeReader.js';
 import { ErrorClassifier } from '../utils/ErrorClassifier.js';
 import { ConversationLockedError, SessionNotFoundError } from '../errors/index.js';
 
 export class DeepCodeReasonerV2 {
   private geminiService: GeminiService;
   private conversationalGemini: ConversationalGeminiService;
   private conversationManager: ConversationManager;
-  private codeReader: CodeReader;
+  private codeReader: SecureCodeReader;
 
   constructor(geminiApiKey: string) {
     this.geminiService = new GeminiService(geminiApiKey);
     this.conversationalGemini = new ConversationalGeminiService(geminiApiKey);
     this.conversationManager = new ConversationManager();
-    this.codeReader = new CodeReader();
+    this.codeReader = new SecureCodeReader();
   }
 
   async escalateFromClaudeCode(
 
@@ -13,6 +13,7 @@ import * as dotenv from 'dotenv';
 import { DeepCodeReasonerV2 } from './analyzers/DeepCodeReasonerV2.js';
 import type { ClaudeCodeContext, CodeScope } from './models/types.js';
 import { ErrorClassifier } from './utils/ErrorClassifier.js';
+import { InputValidator } from './utils/InputValidator.js';
 
 // Load environment variables
 dotenv.config();
@@ -405,11 +406,13 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
     switch (name) {
       case 'escalate_analysis': {
         const parsed = EscalateAnalysisSchema.parse(args);
+        
+        // Validate and sanitize the Claude context
+        const validatedContext = InputValidator.validateClaudeContext(parsed.claude_context);
+        
+        // Override with specific values from the parsed input
         const context: ClaudeCodeContext = {
-          attemptedApproaches: parsed.claude_context.attempted_approaches,
-          partialFindings: parsed.claude_context.partial_findings,
-          stuckPoints: [parsed.claude_context.stuck_description],
-          focusArea: parsed.claude_context.code_scope as CodeScope,
+          ...validatedContext,
           analysisBudgetRemaining: parsed.time_budget_seconds,
         };
 
@@ -431,8 +434,18 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
 
       case 'trace_execution_path': {
         const parsed = TraceExecutionPathSchema.parse(args);
+        
+        // Validate the entry point file path
+        const validatedPath = InputValidator.validateFilePaths([parsed.entry_point.file])[0];
+        if (!validatedPath) {
+          throw new McpError(
+            ErrorCode.InvalidParams,
+            'Invalid entry point file path',
+          );
+        }
+        
         const result = await deepReasoner.traceExecutionPath(
-          parsed.entry_point,
+          { ...parsed.entry_point, file: validatedPath },
           parsed.max_depth,
           parsed.include_data_flow,
         );
@@ -449,10 +462,20 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
 
       case 'hypothesis_test': {
         const parsed = HypothesisTestSchema.parse(args);
+        
+        // Validate file paths
+        const validatedFiles = InputValidator.validateFilePaths(parsed.code_scope.files);
+        if (validatedFiles.length === 0) {
+          throw new McpError(
+            ErrorCode.InvalidParams,
+            'No valid file paths provided',
+          );
+        }
+        
         const result = await deepReasoner.testHypothesis(
-          parsed.hypothesis,
-          parsed.code_scope.files,
-          parsed.test_approach,
+          InputValidator.validateString(parsed.hypothesis, 2000),
+          validatedFiles,
+          InputValidator.validateString(parsed.test_approach, 1000),
         );
 
         return {
@@ -467,8 +490,18 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
 
       case 'cross_system_impact': {
         const parsed = CrossSystemImpactSchema.parse(args);
+        
+        // Validate file paths
+        const validatedFiles = InputValidator.validateFilePaths(parsed.change_scope.files);
+        if (validatedFiles.length === 0) {
+          throw new McpError(
+            ErrorCode.InvalidParams,
+            'No valid file paths provided',
+          );
+        }
+        
         const result = await deepReasoner.analyzeCrossSystemImpact(
-          parsed.change_scope.files,
+          validatedFiles,
           parsed.impact_types,
         );
 
@@ -484,10 +517,22 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
 
       case 'performance_bottleneck': {
         const parsed = PerformanceBottleneckSchema.parse(args);
+        
+        // Validate the entry point file path
+        const validatedPath = InputValidator.validateFilePaths([parsed.code_path.entry_point.file])[0];
+        if (!validatedPath) {
+          throw new McpError(
+            ErrorCode.InvalidParams,
+            'Invalid entry point file path',
+          );
+        }
+        
         const result = await deepReasoner.analyzePerformance(
-          parsed.code_path.entry_point,
+          { ...parsed.code_path.entry_point, file: validatedPath },
           parsed.profile_depth,
-          parsed.code_path.suspected_issues,
+          parsed.code_path.suspected_issues ? 
+            InputValidator.validateStringArray(parsed.code_path.suspected_issues) : 
+            undefined,
         );
 
         return {
@@ -502,11 +547,13 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
 
       case 'start_conversation': {
         const parsed = StartConversationSchema.parse(args);
+        
+        // Validate and sanitize the Claude context
+        const validatedContext = InputValidator.validateClaudeContext(parsed.claude_context);
+        
+        // Override default budget
         const context: ClaudeCodeContext = {
-          attemptedApproaches: parsed.claude_context.attempted_approaches,
-          partialFindings: parsed.claude_context.partial_findings,
-          stuckPoints: [parsed.claude_context.stuck_description],
-          focusArea: parsed.claude_context.code_scope as CodeScope,
+          ...validatedContext,
           analysisBudgetRemaining: 60,
         };