Add mongodb-query skill with testing infrastructure MCP-425#2
Add mongodb-query skill with testing infrastructure MCP-425#2betsybutton merged 27 commits intomainfrom
Conversation
Introduces a new skill for generating MongoDB queries and aggregation pipelines from natural language descriptions. Includes comprehensive testing workspace with fixtures, evaluation cases, and performance benchmarks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Moved test infrastructure (evals/ and mongodb-query-workspace/) outside the skill directory to testing/mongodb-query/. Skills should only contain essential files that agents need; test fixtures, evaluation cases, and benchmarks are development/CI artifacts that don't belong in the skill package. The skill now passes skill-validator check with: - Clean structure (SKILL.md only) - 2,259 tokens (within recommended limits) - No validation errors or warnings Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated skill name from mongodb-query to mongodb-query-generator to better reflect its purpose. Updated all path references in testing documentation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
||
| ### 3. Choose Query Type: Find vs Aggregation | ||
|
|
||
| Prefer find queries over aggregation pipelines because find queries are simpler, faster, and easier for other developers to understand. Find queries also have better performance characteristics for simple filtering and sorting since they avoid the aggregation framework overhead. |
There was a problem hiding this comment.
find is not faster than equivalent agg
| ### Query Quality | ||
| 1. **Use indexes efficiently** - Structure filters to leverage available indexes: | ||
| - Check collection indexes before generating the query | ||
| - Order filter fields to match index key order when possible |
There was a problem hiding this comment.
that's not a thing - order of filter fields does not matter.
| 1. **Use indexes efficiently** - Structure filters to leverage available indexes: | ||
| - Check collection indexes before generating the query | ||
| - Order filter fields to match index key order when possible | ||
| - Use equality matches before range queries (matches index prefix behavior) |
There was a problem hiding this comment.
in the query it does not matter
| - Check collection indexes before generating the query | ||
| - Order filter fields to match index key order when possible | ||
| - Use equality matches before range queries (matches index prefix behavior) | ||
| - Avoid operators that prevent index usage: `$where`, `$text` without text index, `$ne`, `$nin` (use sparingly) |
There was a problem hiding this comment.
$where should never be used, $expr should be used only when necessary, I don't see a problem with inequality
| 4. **Handle edge cases** - Consider null values, missing fields, type mismatches | ||
| 5. **Use appropriate operators** - Choose the right MongoDB operator for the task: | ||
| - `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte` for comparisons | ||
| - `$in`, `$nin` for membership tests |
There was a problem hiding this comment.
that's not what $in $nin do. They are just equality operators with multiple possible values.
| - `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte` for comparisons | ||
| - `$in`, `$nin` for membership tests | ||
| - `$and`, `$or`, `$not`, `$nor` for logical operations | ||
| - `$regex` for text pattern matching |
There was a problem hiding this comment.
where should we mention left anchored being preferred?
There was a problem hiding this comment.
I think we can mention it here - good point
| - Place `$match` stages at the beginning of the pipeline | ||
| - Initial `$match` and `$sort` stages can use indexes if they precede any stage that modifies documents | ||
| - Structure `$match` filters to align with available indexes | ||
| - Avoid `$project`, `$unwind`, or other transformations before `$match` when possible |
There was a problem hiding this comment.
$project should be at the end.
| - Initial `$match` and `$sort` stages can use indexes if they precede any stage that modifies documents | ||
| - Structure `$match` filters to align with available indexes | ||
| - Avoid `$project`, `$unwind`, or other transformations before `$match` when possible | ||
| 5. **Optimize `$lookup`** - Consider denormalization for frequently joined data |
There was a problem hiding this comment.
ensure indexes on fields in foreign collection being joined
|
|
||
| 1. **Using nonexistent field names** - Always validate against schema first. MongoDB won't error; it just returns no results. | ||
| 2. **Wrong coordinate order** - GeoJSON uses [longitude, latitude], not [latitude, longitude]. | ||
| 3. **Choosing aggregation when find suffices** - Aggregation adds overhead; use find for simple queries. |
| 2. **Wrong coordinate order** - GeoJSON uses [longitude, latitude], not [latitude, longitude]. | ||
| 3. **Choosing aggregation when find suffices** - Aggregation adds overhead; use find for simple queries. | ||
| 4. **Missing index awareness** - Structure queries to leverage indexes. If no index exists for key filters, mention this to the user. | ||
| 5. **Type mismatches** - Check schema to ensure operators match field types (e.g., don't use `$gt` on strings when comparing alphabetically). |
There was a problem hiding this comment.
wait, why not? that's how it works with strings...
| } | ||
| } | ||
| ``` | ||
|
|
There was a problem hiding this comment.
should there be two more steps here - suggest index if there isn't one and suggest adding limit if a collection is relatively large?
- Add guidance to avoid redundant operators in queries - Recommend left-anchored regex patterns for index efficiency - Update aggregation pipeline guidance (project at end, not early) - Add steps to suggest indexes and limits in query workflow - Refine query quality best practices and common pitfalls - Update $lookup optimization to mention foreign collection indexes Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new mongodb-query-generator skill plus a self-contained testing workspace intended to validate/query-generation behavior with MongoDB fixtures, eval metadata, and benchmark artifacts.
Changes:
- Introduces
skills/mongodb-query-generator/SKILL.mddefining the MongoDB query generation workflow and response format. - Adds a test workspace under
testing/mongodb-query-generator/mongodb-query-workspace/with fixture datasets, a fixture loader, trigger eval prompts, and iteration-1 results/benchmarks. - Adds an eval catalog at
testing/mongodb-query-generator/evals/evals.jsondescribing prompts and expected outcomes.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| testing/mongodb-query-generator/mongodb-query-workspace/trigger-eval.json | Adds trigger/non-trigger query set for description triggering evaluation. |
| testing/mongodb-query-generator/mongodb-query-workspace/package.json | Defines a small Node workspace to run fixture loading (tsx) and hold deps. |
| testing/mongodb-query-generator/mongodb-query-workspace/optimization-output.log | Adds an optimization run log artifact (currently contains local env details). |
| testing/mongodb-query-generator/mongodb-query-workspace/load-fixtures.ts | Adds TypeScript fixture loader for MongoDB. |
| testing/mongodb-query-generator/mongodb-query-workspace/load-fixtures.js | Adds an alternate JS fixture loader (currently conflicts with ESM + uses eval). |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/test-results.md | Documents iteration-1 test outputs vs expected results. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/simple-find/eval_metadata.json | Adds assertions metadata for “simple-find”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/relative-date-find-last-year/eval_metadata.json | Adds assertions metadata for “relative-date-find-last-year”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/geo-based-find/eval_metadata.json | Adds assertions metadata for “geo-based-find”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/find-with-non-english/eval_metadata.json | Adds assertions metadata for “find-with-non-english”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/find-with-filter-projection-sort-limit/eval_metadata.json | Adds assertions metadata for “find-with-filter-projection-sort-limit”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/find-translates-to-agg-mode-count/eval_metadata.json | Adds assertions metadata for “find-translates-to-agg-mode-count”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/benchmark.json | Adds structured benchmark summary for iteration-1. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/agg-join-lookup/eval_metadata.json | Adds assertions metadata for “agg-join-lookup”. |
| testing/mongodb-query-generator/mongodb-query-workspace/iteration-1/agg-complex-regex-split/eval_metadata.json | Adds assertions metadata for “agg-complex-regex-split”. |
| testing/mongodb-query-generator/mongodb-query-workspace/fixtures/nyc.parking.ts | Adds NYC parking fixture dataset. |
| testing/mongodb-query-generator/mongodb-query-workspace/fixtures/netflix.movies.ts | Adds Netflix movies fixture dataset. |
| testing/mongodb-query-generator/mongodb-query-workspace/fixtures/netflix.comments.ts | Adds Netflix comments fixture dataset (incl. movie_id). |
| testing/mongodb-query-generator/mongodb-query-workspace/fixtures/berlin.cocktailbars.ts | Adds Berlin cocktail bars fixture dataset. |
| testing/mongodb-query-generator/mongodb-query-workspace/fixtures/airbnb.listingsAndReviews.ts | Adds Airbnb listings fixture dataset. |
| testing/mongodb-query-generator/mongodb-query-workspace/SUMMARY.md | Adds a narrative summary of the skill testing and results. |
| testing/mongodb-query-generator/mongodb-query-workspace/README.md | Documents how to install deps, load fixtures, and run the workspace. |
| testing/mongodb-query-generator/evals/evals.json | Adds an eval list of prompts/expected outputs for the skill. |
| skills/mongodb-query-generator/SKILL.md | Adds the skill definition and guidance for generating MongoDB queries/pipelines. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try { | ||
| const docs = eval('(' + jsonStr + ')'); | ||
| // Convert _id fields with $oid to ObjectId | ||
| return docs.map(doc => { |
There was a problem hiding this comment.
parseFixture uses eval(...) on file contents. Even in a test workspace, this is an unnecessary code execution risk and will run arbitrary JS if a fixture file is modified. Prefer a safe parser/deserializer (e.g., bson EJSON) or remove this script entirely since load-fixtures.ts already loads fixtures without eval.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| 3. **Project only needed fields** - Reduce data transfer with projections | ||
| 4. **Validate field names** against the schema before using them | ||
| 5. **Handle edge cases** - Consider null values, missing fields, type mismatches | ||
| 5. **Use appropriate operators** - Choose the right MongoDB operator for the task: |
There was a problem hiding this comment.
In the Best Practices list the numbering is duplicated (5. appears twice), which makes references ambiguous. Please renumber the list items so each point has a unique number.
| 5. **Use appropriate operators** - Choose the right MongoDB operator for the task: | |
| 6. **Use appropriate operators** - Choose the right MongoDB operator for the task: |
There was a problem hiding this comment.
Tagging on to Copilot here, I was also about to flag this 😅
There was a problem hiding this comment.
Ah, good catch!
…Y.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@betsybutton I've opened a new pull request, #18, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ion-1/test-results.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ixtures.js Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| "name": "uses_aggregation_with_search", | ||
| "description": "Should use aggregation with $search stage for full text search, not regex" | ||
| }, | ||
| { | ||
| "name": "has_search_stage", | ||
| "description": "Pipeline includes $search stage" | ||
| }, |
There was a problem hiding this comment.
This eval metadata asserts the solution should use an aggregation $search stage “not regex”, but the PR description calls out Atlas Search as a known limitation (no $search support). To keep the evaluation spec consistent with intended behavior, either relax these assertions to allow $regex/text index approaches, or update the skill/docs to actually require $search when available.
| **Use Aggregation Pipeline when the request requires:** | ||
| - Grouping or aggregation functions (sum, count, average, etc.) | ||
| - Multiple transformation stages | ||
| - Computed fields or data reshaping | ||
| - Joins with other collections ($lookup) | ||
| - Array unwinding or complex array operations | ||
| - Text search with scoring | ||
|
|
There was a problem hiding this comment.
The skill spec says to use aggregation for “Text search with scoring”, which strongly implies Atlas Search $search, but the PR description lists Atlas Search as unsupported. To avoid misleading consumers, clarify the limitation in SKILL.md (and/or adjust this bullet) so it’s clear whether $search is expected or if $regex/$text are the supported approaches.
There was a problem hiding this comment.
Need to reconcile this with the search skill
There was a problem hiding this comment.
Also flagging we might want to clarify the scope here, related to this - i.e.: we're saying here "Text search with scoring" as a reason to use an aggregation pipeline, which puts it explicitly in scope for this skill. But the only text search guidance in the skill is line 114: "Do not use $text without a text index." There's no mention of $search or $vectorSearch at all. Presumably that's intentional because we don't want to step on the search skill, but not being clear about the scope gate here is a problem because:
$searchhas completely different syntax, index requirements (Atlas Search indexes, not regular text indexes), and scoring mechanics than$text$searchmust be the first stage in a pipeline, which conflicts with the "place$matchat the beginning" advice (line 139)- An agent triggered by "search my documents for..." could produce a
$text-based query when the user's deployment uses Atlas Search, or try to apply the skill's aggregation ordering rules to a$searchpipeline where they don't hold $vectorSearchis even further from anything the skill covers
The fix would be a scope-gate noting that Atlas Search ($search, $vectorSearch) queries are not covered by this skill.
There was a problem hiding this comment.
Opting to make the scopes cleaner cut; will remove this line & update other parts accordingly!
Co-authored-by: betsybutton <36177962+betsybutton@users.noreply.github.com>
Removed "wants to optimize a query" trigger phrase to avoid overlap with mongodb-query-optimizer skill. Added explicit exclusion stating this skill does NOT analyze or optimize existing queries. Clarified triggers focus on query generation/creation rather than optimization. This establishes clear boundaries: - mongodb-query-generator: Generate/write/create new queries - mongodb-query-optimizer: Analyze/optimize existing queries Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…-querying Updated skill name throughout the codebase: - Renamed directories: skills/ and testing/ - Updated frontmatter name in SKILL.md - Updated skill heading to "MongoDB Natural Language Querying" - Updated all references in README.md and SUMMARY.md - Updated skill_name in evals.json The new name better reflects that this skill converts natural language descriptions into MongoDB queries, distinguishing it from other query-related skills. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 23 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "id": 13, | ||
| "name": "find-with-non-english", | ||
| "prompt": "¿Qué alojamiento tiene el precio más bajo? devolver el número en un campo llamado \"precio\" en español", | ||
| "expected_output": "Find query with sort by price ascending, limit 1, project price field renamed", |
There was a problem hiding this comment.
expected_output for find-with-non-english describes a find query that “project[s] price field renamed”, but find projections can’t rename fields (renaming requires aggregation with $project/$addFields). Consider updating the expectation to allow/require an aggregation pipeline for the rename (consistent with the iteration-1 results doc).
| "expected_output": "Find query with sort by price ascending, limit 1, project price field renamed", | |
| "expected_output": "Aggregation pipeline with $sort by price ascending, $limit 1, and $project to rename price field to 'precio'", |
There was a problem hiding this comment.
Copilot is wrong. As of 4.4 find projection and agg projection have the same capabilities.
| "id": 14, | ||
| "name": "find-with-regex-string-ops", | ||
| "prompt": "Write a query that does the following: find all of the parking incidents that occurred on any ave. Return all of the plate ids involved with their summons number and vehicle make and body type. Put the vehicle make and body type into lower case. No _id, sorted by the summons number lowest first.", | ||
| "expected_output": "Find query with regex for 'ave', projection with lowercase operations, sort by summons_number", |
There was a problem hiding this comment.
expected_output for find-with-regex-string-ops mentions “Put the vehicle make and body type into lower case” while still calling it a find query. Lowercasing/transformations require aggregation operators like $toLower in a $project stage, so this expectation should be updated to describe an aggregation pipeline (or drop the lowercasing requirement).
| "expected_output": "Find query with regex for 'ave', projection with lowercase operations, sort by summons_number", | |
| "expected_output": "Aggregation pipeline with $match using regex for 'ave', $project applying lowercase to vehicle make and body type and excluding _id, and $sort by summons_number ascending", |
dacharyc
left a comment
There was a problem hiding this comment.
Hey @betsybutton - overall this Skill is in good shape; just a handful of comments related to some scope-gate concerns, cross-platform support, and a couple of places where I'm wondering if we can be more specific.
I think the scope gating and allowed tools suggestion is the only required change; the rest is on the subjective side.
| 3. **Project only needed fields** - Reduce data transfer with projections | ||
| 4. **Validate field names** against the schema before using them | ||
| 5. **Handle edge cases** - Consider null values, missing fields, type mismatches | ||
| 5. **Use appropriate operators** - Choose the right MongoDB operator for the task: |
There was a problem hiding this comment.
Tagging on to Copilot here, I was also about to flag this 😅
|
|
||
| # MongoDB Natural Language Querying | ||
|
|
||
| You are an expert MongoDB query generator. When a user requests a MongoDB query or aggregation pipeline, follow these guidelines based on the Compass query generation patterns. |
There was a problem hiding this comment.
We should probably scope-gate what types of aggregation pipeline activities we expect an agent to do with this skill, so an agent who needs to do some other type of aggregation operation we're not covering here, like outputting to another collection, can opt out of this skill early.
| **Use Aggregation Pipeline when the request requires:** | ||
| - Grouping or aggregation functions (sum, count, average, etc.) | ||
| - Multiple transformation stages | ||
| - Computed fields or data reshaping | ||
| - Joins with other collections ($lookup) | ||
| - Array unwinding or complex array operations | ||
| - Text search with scoring | ||
|
|
There was a problem hiding this comment.
Also flagging we might want to clarify the scope here, related to this - i.e.: we're saying here "Text search with scoring" as a reason to use an aggregation pipeline, which puts it explicitly in scope for this skill. But the only text search guidance in the skill is line 114: "Do not use $text without a text index." There's no mention of $search or $vectorSearch at all. Presumably that's intentional because we don't want to step on the search skill, but not being clear about the scope gate here is a problem because:
$searchhas completely different syntax, index requirements (Atlas Search indexes, not regular text indexes), and scoring mechanics than$text$searchmust be the first stage in a pipeline, which conflicts with the "place$matchat the beginning" advice (line 139)- An agent triggered by "search my documents for..." could produce a
$text-based query when the user's deployment uses Atlas Search, or try to apply the skill's aggregation ordering rules to a$searchpipeline where they don't hold $vectorSearchis even further from anything the skill covers
The fix would be a scope-gate noting that Atlas Search ($search, $vectorSearch) queries are not covered by this skill.
|
|
||
| --- | ||
|
|
||
| ## When invoked |
There was a problem hiding this comment.
Just wondering if we have found this section to be needed. It seems to repeat some of what we set up in the step-by-step instructions above. If we're trying to say here: "apply these steps in order" we might put that as guidance at the top of the skill instead of add it retroactively and repeat info here.
- Clarified JSON response format: outer structure is JSON, query strings use MongoDB shell syntax
- Fixed index guidance: generate correct query first, then check index coverage (not the other way around)
- Updated $exists redundancy check to include inequality operators ($gt, $lt, etc.)
- Added note to exclude _id when not needed in projections
- Corrected $in/$nin description: they match against a list of values (OR'ed $eq/$ne), not array membership
- Added note that $regex is case sensitive
- Added preference for {$ne: null} over {$exists: false}
- Updated aggregation pipeline guidance to check indexes after generating $match filters
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 27 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Drop existing collection | ||
| try { | ||
| await collection.drop(); | ||
| console.log(` Dropped existing ${namespace}`); | ||
| } catch (e) { | ||
| // Collection might not exist | ||
| } |
| "expected_skill": null, | ||
| "should_not_trigger": ["mongodb-natural-language-querying", "search-and-ai"], | ||
| "reasoning": "Query optimization should trigger mongodb-query-optimizer skill, not these two", |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "clear_mongodb_natural_language_querying": 5, | ||
| "clear_search_ai": 5, | ||
| "ambiguous_cases": 10, | ||
| "edge_cases": 4, |
| "expected_skill": "mongodb-natural-language-querying", | ||
| "should_not_trigger": "search-and-ai", |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dacharyc
left a comment
There was a problem hiding this comment.
Thanks for addressing my feedback, @betsybutton ! Overall I think we're in good shape here. Just a few small comments that we may want to investigate based on changes since my initial review, but I don't consider any of this blocking.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dacharyc
left a comment
There was a problem hiding this comment.
Looks good to me! Clear to merge from my side whenever you're ready. ✅
Introduces a new skill for generating MongoDB queries and aggregation pipelines from natural language descriptions. Includes comprehensive testing workspace with fixtures, evaluation cases, and performance benchmarks.
MongoDB Agent Skill Submission
Skill Information
Skill Name: mongodb-natural-language-querying
Skill Directory:
skills/mongodb-natural-language-queryingUse Case
This skill addresses the challenge of generating MongoDB queries and
aggregation pipelines from natural language descriptions. MongoDB users
often need to:
language
joins
The skill automates this translation by leveraging collection schemas,
sample documents, and index information to generate accurate, optimized
queries.
Value Proposition
This skill is valuable for MongoDB users because it:
syntax or aggregation pipeline stage ordering
data to generate queries that match the real data structure
query patterns
queries, text search, regex operations, and $lookup joins
(validated with Spanish)
before generating queries
Tested across 8 diverse scenarios with a 93.75% success rate.
Special Considerations
Prerequisites:
.mcp.jsonconfiguration withmongodb-mcp-serverentryMCP Server Configuration Example:
{ "mcpServers": { "mongodb": { "command": "npx", "args": ["-y", "mongodb-mcp-server@latest"], "env": { "MDB_MCP_CONNECTION_STRING": "your-connection-string-here" } } } }Limitations:
text indexes instead
Validation Prompts
but only return the title and year fields, sorted by title
coordinates [13.4050, 52.5200]
netflix.movies collection
netflix.movies (Spanish: Find all movies from the 1990s)
with their comment counts and top 3 most recent comments
Author Self-Validation
SME Review
SME: @asya999
Additional Context
Testing Infrastructure:
mongodb-natural-language-querying/mongodb-query-workspace/
Notable Test Scenarios:
The one partially passing test (text search) is a known limitation where the skill suggests text indexes instead of $search operator, which is acceptable behavior.