konflux-devlake-mcp/.cursorrules at main · msu8/konflux-devlake-mcp · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# Konflux DevLake MCP Server - Cursor Rules

## Project Overview
This is a Model Context Protocol (MCP) server that provides tools for querying and analyzing data from the Konflux DevLake database. The server exposes tools for database operations, incident analysis, deployment tracking, and PR retest analysis.

## Code Style & Formatting

### Python Code Style
- **Formatter**: Use `black` with line length 100
  - Command: `black --line-length 100 .`
  - Always format code before committing
- **Linter**: Use `flake8` for code quality
  - Command: `flake8 .`
  - Fix all linting errors before committing
- **Type Hints**: Use type hints for all function parameters and return types
- **Docstrings**: Use Google-style docstrings for all classes and functions

### YAML Code Style
- **Linter**: Use `yamllint` for YAML file validation
  - Command: `yamllint .`
  - Configuration file: `.yamllint` (extends default, max line length 200)
  - Fix all linting errors before committing
  - Applies to: `.github/workflows/*.yaml`, `k8s/*.yaml`, and all other YAML files

### Code Formatting Rules
- Maximum line length: 100 characters
- Use double quotes for strings (black default)
- Use trailing commas in multi-line collections
- Use f-strings for string formatting (not `.format()` or `%`)
- Use `async`/`await` for all database operations and tool calls
- **NO EMOJIS**: Do not use any emojis in code, comments, docstrings, or any project files

## Architecture Patterns

### Tool Creation Pattern
All tools must:
1. Inherit from `BaseTool` in `tools/base/base_tool.py`
2. Implement `get_tools()` method returning a list of `Tool` objects
3. Implement `call_tool(name, arguments)` method for async tool execution
4. Use `log_tool_call()` for all tool invocations
5. Return TOON-encoded strings for token efficiency (not JSON)

### Tool Structure
```python
class MyTool(BaseTool):
    def __init__(self, db_connection):
        super().__init__(db_connection)
        self.logger = get_logger(f"{__name__}.MyTool")

    def get_tools(self) -> List[Tool]:
        return [Tool(name="...", description="...", inputSchema={...})]

    async def call_tool(self, name: str, arguments: Dict[str, Any]) -> str:
        # Use toon_encode for output
        return toon_encode(result, {"delimiter": ",", "indent": 2, "lengthMarker": ""})
```

### Database Query Patterns
- Always use parameterized queries or string formatting with validation
- Use `await self.db_connection.execute_query(query, limit)`
- Handle MySQL type conversions (Decimal/string to int/float)
- Always validate date fields before using in queries
- Use CTEs (WITH clauses) for complex queries with deduplication

### Error Handling
- Always wrap tool execution in try-except blocks
- Log errors with `self.logger.error()` including context
- Return structured error responses: `{"success": False, "error": str(e)}`
- Use `log_tool_call(..., success=False, error=...)` for failed calls
- Handle `ClosedResourceError` and `CancelledError` gracefully (don't log as errors)

### Logging
- Use `get_logger(__name__)` for module-level loggers
- Use appropriate log levels:
  - `DEBUG`: Detailed diagnostic information
  - `INFO`: General informational messages
  - `WARNING`: Warning messages
  - `ERROR`: Error conditions
- Never log sensitive information (passwords, tokens, etc.)
- Suppress noisy library errors (e.g., ClosedResourceError from MCP library)

## File Organization

### Directory Structure
```
tools/
  base/           # Base tool classes
  devlake/        # DevLake-specific tools (incidents, deployments, PR retests)
  database_tools.py  # Database operation tools
  tools_manager.py   # Tool registry and routing

server/
  core/           # Core MCP server implementation
  factory/        # Server factory for different transports
  handlers/       # Request handlers
  transport/      # Transport implementations (HTTP, stdio)

utils/
  config.py       # Configuration management
  db.py           # Database connection and utilities
  logger.py       # Logging setup
  security.py     # Security validation
```

### Naming Conventions
- Files: `snake_case.py`
- Classes: `PascalCase`
- Functions/methods: `snake_case`
- Constants: `UPPER_SNAKE_CASE`
- Private methods: `_leading_underscore`

## Testing Requirements

### Unit Tests
- All tools must have unit tests in `tests/unit/`
- Test files: `test_<module_name>.py`
- Use `pytest` with `pytest-asyncio` for async tests
- Mock database connections in tests
- Test both success and error cases

### Running Tests
- Unit tests: `pytest tests/unit/ -v`
- All tests: `pytest`
- Use `run_tests.py` script for convenience

## Database Patterns

### DevLake Database
- Main database: `lake`
- Key tables:
  - `incidents`: Incident tracking
  - `cicd_deployments`, `cicd_deployment_commits`: Deployment data
  - `pull_requests`, `pull_request_comments`: PR data
  - `repos`: Repository information
  - `project_mapping`: Project-to-resource mapping

### Query Best Practices
- Always join with `project_mapping` when filtering by project
- Use `LEFT JOIN` for optional relationships
- Filter by project: `LEFT JOIN lake.project_mapping pm ON r.id = pm.row_id AND pm.table = 'repos'`
- Use `WHERE pm.project_name = 'Project Name'` for project filtering
- Always validate and sanitize user inputs before using in queries

## Tool Output Format

### TOON Format (Token-Efficient)
- Use `toon_encode()` from `toon_format` library
- Configuration: `{"delimiter": ",", "indent": 2, "lengthMarker": ""}`
- Use TOON instead of JSON to reduce token costs (30-60% reduction)

### Response Structure
```python
{
    "success": True/False,
    "data": [...],  # or error message
    "filters": {...},  # applied filters
    "query": "...",  # SQL query (optional)
}
```

## Security Requirements

### SQL Injection Prevention
- Always use parameterized queries or validated string formatting
- Use `SQLInjectionDetector` from `utils.security` for validation
- Never concatenate user input directly into SQL queries
- Validate table and database names before use

### Input Validation
- Validate all tool arguments
- Check date formats before using in queries
- Validate enum values (status, environment, etc.)
- Set reasonable limits (max rows, date ranges, etc.)

## Documentation

### Code Documentation
- All public classes and functions must have docstrings
- Include parameter descriptions and return value descriptions
- Document complex algorithms and business logic
- Keep docstrings up-to-date with code changes

### Tool Descriptions
- Tool descriptions should be comprehensive and clear
- Include examples in descriptions when helpful
- Document all input parameters with types and constraints
- Explain what the tool does and when to use it

## Git Workflow

### Commit Messages
- Use descriptive commit messages
- Include context about what changed and why
- Reference issue numbers if applicable

### Pre-commit Checklist
1. Run `black --check --line-length 100 .`
2. Run `flake8 .`
3. Run `yamllint .` (for YAML files)
4. Run unit tests: `pytest tests/unit/ -v`
5. Verify all changes are properly formatted
6. Check that error handling is appropriate

## Common Patterns

### Date Handling
- Accept dates in formats: `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`
- Convert date-only strings to full datetime: `f"{date} 00:00:00"`
- Use `datetime.now() - timedelta(days=N)` for days_back calculations
- Always validate date_field parameter against allowed values

### Bot Exclusion
- Filter bot comments using: `account_id != 'github:GithubAccount:1:0'`
- Don't exclude NULL/empty account_id (might be legitimate)
- Document bot exclusion logic in tool descriptions

### Project/Repository Filtering
- Use `project_mapping` table for project filtering
- Support partial repository name matching
- Use separate filter variables for main queries vs subqueries
- Example: `project_filter` for outer queries, `project_filter_subquery` for CTEs

## Error Messages
- Be descriptive but not verbose
- Include context (tool name, parameters)
- Don't expose internal implementation details
- Use user-friendly error messages

## Performance Considerations
- Use LIMIT clauses to prevent large result sets
- Default limits: 50-100 rows for most queries
- Use CTEs for complex queries to improve readability
- Consider query performance when joining multiple tables

## Dependencies
- Core: `mcp`, `toon-format`, `aiomysql`
- Server: `uvicorn`, `starlette`, `anyio`
- Testing: `pytest`, `pytest-asyncio`
- Formatting: `black`, `flake8`, `yamllint`

## Notes
- This is a production MCP server - prioritize stability and error handling
- Token efficiency matters - use TOON format for large responses
- Database queries should be optimized for the DevLake schema
- Always consider backward compatibility when changing tool interfaces
- **NO EMOJIS**: Never use emojis in code, comments, docstrings, error messages, or any project files