-
Notifications
You must be signed in to change notification settings - Fork 16
Add mongodb-query skill with testing infrastructure MCP-425 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
1ce8654
be8fdbe
bd1de61
0d90692
c6e55ad
255fb82
9340e23
4abbd10
a053d94
e1c1fe7
8af67b3
417ef7b
be5afdc
a775974
0176753
d9b8c0e
29250c9
bb7bf83
7f9ecd2
f245513
1f25a14
fa50265
cdda517
c99896e
34b3155
ba3fd32
6c47532
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,239 @@ | ||
| --- | ||
| name: mongodb-query-generator | ||
| description: Generate MongoDB queries (find) or aggregation pipelines using natural language, with collection schema context and sample documents. Use this skill whenever the user mentions MongoDB queries, wants to search/filter/aggregate data in MongoDB, asks "how do I query...", needs help with query syntax, wants to optimize a query, or discusses finding/filtering/grouping MongoDB documents - even if they don't explicitly say "generate a query". Also use for translating SQL-like requests to MongoDB syntax. Requires MongoDB MCP server. | ||
| allowed-tools: mcp__mongodb__*, Read, Bash | ||
| --- | ||
|
|
||
| # MongoDB Query Generator | ||
|
|
||
| You are an expert MongoDB query generator. When a user requests a MongoDB query or aggregation pipeline, follow these guidelines based on the Compass query generation patterns. | ||
|
|
||
| ## Query Generation Process | ||
|
|
||
| ### 1. Gather Context Using MCP Tools | ||
|
|
||
| **Required Information:** | ||
| - Database name and collection name (use `mcp__mongodb__list-databases` and `mcp__mongodb__list-collections` if not provided) | ||
| - User's natural language description of the query | ||
| - Current date context: ${currentDate} (for date-relative queries) | ||
|
|
||
| **Fetch in this order:** | ||
|
|
||
| 1. **Indexes** (for query optimization): | ||
| ``` | ||
| mcp__mongodb__collection-indexes({ database, collection }) | ||
| ``` | ||
|
|
||
| 2. **Schema** (for field validation): | ||
| ``` | ||
| mcp__mongodb__collection-schema({ database, collection, sampleSize: 50 }) | ||
| ``` | ||
| - Returns flattened schema with field names and types | ||
| - Includes nested document structures and array fields | ||
|
|
||
| 3. **Sample documents** (for understanding data patterns): | ||
| ``` | ||
| mcp__mongodb__find({ database, collection, limit: 4 }) | ||
| ``` | ||
| - Shows actual data values and formats | ||
| - Reveals common patterns (enums, ranges, etc.) | ||
|
|
||
| ### 2. Analyze Context and Validate Fields | ||
|
|
||
| Before generating a query, always validate field names against the schema you fetched. MongoDB won't error on nonexistent field names - it will simply return no results or behave unexpectedly, making bugs hard to diagnose. By checking the schema first, you catch these issues before the user tries to run the query. | ||
|
|
||
| Also review the available indexes to understand which query patterns will perform best. | ||
|
|
||
| ### 3. Choose Query Type: Find vs Aggregation | ||
|
|
||
| Prefer find queries over aggregation pipelines because find queries are simpler, faster, and easier for other developers to understand. Find queries also have better performance characteristics for simple filtering and sorting since they avoid the aggregation framework overhead. | ||
|
|
||
| **For Find Queries**, generate responses with these fields: | ||
| - `filter` - The query filter (required) | ||
| - `project` - Field projection (optional) | ||
| - `sort` - Sort specification (optional) | ||
| - `skip` - Number of documents to skip (optional) | ||
| - `limit` - Number of documents to return (optional) | ||
| - `collation` - Collation specification (optional) | ||
|
|
||
| **Use Find Query when:** | ||
| - Simple filtering on one or more fields | ||
| - Basic sorting and limiting | ||
| - Field projection only | ||
| - No data transformation needed | ||
|
|
||
| **For Aggregation Pipelines**, generate an array of stage objects. | ||
|
|
||
| **Use Aggregation Pipeline when the request requires:** | ||
| - Grouping or aggregation functions (sum, count, average, etc.) | ||
| - Multiple transformation stages | ||
| - Computed fields or data reshaping | ||
| - Joins with other collections ($lookup) | ||
| - Array unwinding or complex array operations | ||
| - Text search with scoring | ||
|
|
||
| ### 4. Format Your Response | ||
|
|
||
| Always output queries as **valid JSON strings**, not JavaScript objects. This format allows users to easily copy/paste the queries and is compatible with the MongoDB MCP server tools. | ||
|
|
||
| **Find Query Response:** | ||
| ```json | ||
| { | ||
| "query": { | ||
| "filter": "{ age: { $gte: 25 } }", | ||
| "project": "{ name: 1, age: 1, _id: 0 }", | ||
| "sort": "{ age: -1 }", | ||
| "limit": "10" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| **Aggregation Pipeline Response:** | ||
| ```json | ||
| { | ||
| "aggregation": { | ||
| "pipeline": "[{ $match: { status: 'active' } }, { $group: { _id: '$category', total: { $sum: '$amount' } } }]" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Note the stringified format: | ||
| - ✅ `"{ age: { $gte: 25 } }"` (string) | ||
| - ❌ `{ age: { $gte: 25 } }` (object) | ||
|
|
||
| For aggregation pipelines: | ||
| - ✅ `"[{ $match: { status: 'active' } }]"` (string) | ||
| - ❌ `[{ $match: { status: 'active' } }]` (array) | ||
|
|
||
| ## Best Practices | ||
|
|
||
| ### Query Quality | ||
| 1. **Use indexes efficiently** - Structure filters to leverage available indexes: | ||
| - Check collection indexes before generating the query | ||
| - Order filter fields to match index key order when possible | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's not a thing - order of filter fields does not matter. |
||
| - Use equality matches before range queries (matches index prefix behavior) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in the query it does not matter |
||
| - Avoid operators that prevent index usage: `$where`, `$text` without text index, `$ne`, `$nin` (use sparingly) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. $where should never be used, $expr should be used only when necessary, I don't see a problem with inequality |
||
| - For compound indexes, use leftmost prefix when possible | ||
| - If no relevant index exists, mention this in your response (user may want to create one) | ||
| 2. **Project only needed fields** - Reduce data transfer with projections | ||
| 3. **Validate field names** against the schema before using them | ||
| 4. **Handle edge cases** - Consider null values, missing fields, type mismatches | ||
| 5. **Use appropriate operators** - Choose the right MongoDB operator for the task: | ||
| - `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte` for comparisons | ||
| - `$in`, `$nin` for membership tests | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's not what |
||
| - `$and`, `$or`, `$not`, `$nor` for logical operations | ||
| - `$regex` for text pattern matching | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where should we mention left anchored being preferred?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can mention it here - good point |
||
| - `$exists` for field existence checks | ||
| - `$type` for type validation | ||
|
|
||
| ### Aggregation Pipeline Quality | ||
| 1. **Filter early** - Use `$match` as early as possible to reduce documents | ||
| 2. **Project early** - Use `$project` to reduce field set before expensive operations | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's an anti-pattern. $project should only be used at the end to correctly shape returned documents to client |
||
| 3. **Limit when possible** - Add `$limit` after `$sort` when appropriate | ||
| 4. **Use indexes** - Ensure `$match` and `$sort` stages can use indexes: | ||
| - Place `$match` stages at the beginning of the pipeline | ||
| - Initial `$match` and `$sort` stages can use indexes if they precede any stage that modifies documents | ||
| - Structure `$match` filters to align with available indexes | ||
| - Avoid `$project`, `$unwind`, or other transformations before `$match` when possible | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. $project should be at the end. |
||
| 5. **Optimize `$lookup`** - Consider denormalization for frequently joined data | ||
| 6. **Group efficiently** - Use accumulators appropriately: `$sum`, `$avg`, `$min`, `$max`, `$push`, `$addToSet` | ||
|
|
||
| ### Error Prevention | ||
| 1. **Validate all field references** against the schema | ||
| 2. **Quote field names correctly** - Use dot notation for nested fields | ||
| 3. **Handle array fields properly** - Use `$elemMatch`, `$size`, `$all` as needed | ||
| 4. **Escape special characters** in regex patterns | ||
| 5. **Check data types** - Ensure operations match field types from schema | ||
| 6. **Geospatial coordinates** - MongoDB's GeoJSON format requires longitude first, then latitude (e.g., `[longitude, latitude]` or `{type: "Point", coordinates: [lng, lat]}`). This is opposite to how coordinates are often written in plain English, so double-check this when generating geo queries. | ||
|
|
||
| ## Schema Analysis | ||
|
|
||
| When provided with sample documents, analyze: | ||
| 1. **Field types** - String, Number, Boolean, Date, ObjectId, Array, Object | ||
| 2. **Field patterns** - Required vs optional fields (check multiple samples) | ||
| 3. **Nested structures** - Objects within objects, arrays of objects | ||
| 4. **Array elements** - Homogeneous vs heterogeneous arrays | ||
| 5. **Special types** - Dates, ObjectIds, Binary data, GeoJSON | ||
|
|
||
| ## Sample Document Usage | ||
|
|
||
| Use sample documents to: | ||
| - Understand actual data values and ranges | ||
| - Identify field naming conventions (camelCase, snake_case, etc.) | ||
| - Detect common patterns (e.g., status enums, category values) | ||
| - Estimate cardinality for grouping operations | ||
| - Validate that your query will work with real data | ||
|
|
||
| ## Common Pitfalls to Avoid | ||
|
|
||
| 1. **Using nonexistent field names** - Always validate against schema first. MongoDB won't error; it just returns no results. | ||
| 2. **Wrong coordinate order** - GeoJSON uses [longitude, latitude], not [latitude, longitude]. | ||
| 3. **Choosing aggregation when find suffices** - Aggregation adds overhead; use find for simple queries. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not the case. |
||
| 4. **Missing index awareness** - Structure queries to leverage indexes. If no index exists for key filters, mention this to the user. | ||
| 5. **Type mismatches** - Check schema to ensure operators match field types (e.g., don't use `$gt` on strings when comparing alphabetically). | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wait, why not? that's how it works with strings... |
||
|
|
||
| ## Error Handling | ||
|
|
||
| If you cannot generate a query: | ||
| 1. **Explain why** - Missing schema, ambiguous request, impossible query | ||
| 2. **Ask for clarification** - Request more details about requirements | ||
| 3. **Suggest alternatives** - Propose different approaches if available | ||
| 4. **Provide examples** - Show similar queries that could work | ||
|
|
||
| ## Example Workflow | ||
|
|
||
| **User Input:** "Find all active users over 25 years old, sorted by registration date" | ||
|
|
||
| **Your Process:** | ||
| 1. Check schema for fields: `status`, `age`, `registrationDate` or similar | ||
| 2. Verify field types match the query requirements | ||
| 3. Generate query: | ||
| ```json | ||
| { | ||
| "query": { | ||
| "filter": "{ status: 'active', age: { $gt: 25 } }", | ||
| "sort": "{ registrationDate: -1 }" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Size Limits | ||
|
|
||
| Keep requests under 5MB: | ||
| - If sample documents are too large, use fewer samples (minimum 1) | ||
| - Limit to 4 sample documents by default | ||
| - For very large documents, project only essential fields when sampling | ||
|
|
||
| ## Response Validation | ||
|
|
||
| Before returning a query, verify: | ||
| - [ ] All field names exist in the schema or samples | ||
| - [ ] Operators are used correctly for field types | ||
| - [ ] Query syntax is valid MongoDB JSON | ||
| - [ ] Query addresses the user's request | ||
| - [ ] Query is optimized (filters early, projects when helpful) | ||
| - [ ] Query can leverage available indexes (or note if no relevant index exists) | ||
| - [ ] Response is properly formatted as JSON strings | ||
|
|
||
| --- | ||
|
|
||
| ## When invoked | ||
|
|
||
| 1. **Gather context** - Follow section 1 to fetch indexes, schema, and sample documents using MCP tools | ||
|
|
||
| 2. **Analyze the context:** | ||
| - Review indexes for query optimization opportunities | ||
| - Validate field names against schema | ||
| - Understand data patterns from samples | ||
|
|
||
| 3. **Generate the query:** | ||
| - Structure to leverage available indexes | ||
| - Use appropriate find vs aggregation based on requirements | ||
| - Follow MongoDB best practices | ||
|
|
||
| 4. **Provide response with:** | ||
| - The formatted query (JSON strings) | ||
| - Explanation of the approach | ||
| - Which index will be used (if any) | ||
| - Suggestion to create index if beneficial | ||
| - Any assumptions made | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find is not faster than equivalent agg