Adds a product docs toolset to the dbt MCP#620
Conversation
|
i'm not sure where the failure is coming from :( |
b-per
left a comment
There was a problem hiding this comment.
Thanks Mirna. I added a first set of comments but there might be another round once this is addressed.
The current CI failures are because
- we need a changie entry (created with
changie new) - we need to run a
task checkto format the code
b-per
left a comment
There was a problem hiding this comment.
Thanks Mirna for addressing the first set of comments 🙌
I added a few extra.
Also, if possible, could you ask your LLM to clean the commit history? It might be even easier to review if the PR is moved to a few logical commits.
688c5bf to
b6312d3
Compare
|
great thanks so much @b-per and @DevonFulcher ! ive addressed all the comments and ready for your re-review whenever you have a second. one thing that i'm not sure about is the full-text fallback mechanism. I've noticed that when a |
@mirnawong1 is it possible to remove this fallback? I don't think we want to feed our entire docs site into the LLM. |
There was a problem hiding this comment.
Pull request overview
Adds a new “Product Docs” toolset to the dbt MCP server so agents can search and fetch public documentation from docs.getdbt.com via two dedicated MCP tools.
Changes:
- Introduces
search_product_docs(metadata search with full-text fallback) andget_product_doc_pages(parallel Markdown fetch for up to 10 pages). - Adds a cached
ProductDocsClientplus typed dataclass responses for both tools. - Wires the toolset into config/registration and adds unit + integration test coverage.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tools/test_product_docs.py | Unit tests for URL normalization, parsing, ranking, tool behavior, and registration toggles. |
| tests/integration/product_docs/test_product_docs.py | Live integration tests against docs.getdbt.com for client + MCP tools. |
| src/dbt_mcp/tools/toolsets.py | Adds PRODUCT_DOCS toolset and maps tools into it. |
| src/dbt_mcp/tools/tool_names.py | Adds SEARCH_PRODUCT_DOCS and GET_PRODUCT_DOC_PAGES tool names. |
| src/dbt_mcp/tools/human_descriptions.py | Adds human-readable descriptions for the two new tools. |
| src/dbt_mcp/prompts/product_docs/search_product_docs.md | Prompt guidance for using the search tool and fallback behavior. |
| src/dbt_mcp/prompts/product_docs/get_product_doc_pages.md | Prompt guidance for presenting fetched docs content to users. |
| src/dbt_mcp/product_docs/types.py | Defines typed dataclass response models for product docs tools. |
| src/dbt_mcp/product_docs/tools.py | Implements and registers the two MCP tools, including parallel fetch and error handling. |
| src/dbt_mcp/product_docs/client.py | Adds HTTP fetch + TTL caching + search ranking + full-text search support. |
| src/dbt_mcp/product_docs/init.py | Introduces the product_docs package. |
| src/dbt_mcp/mcp/server.py | Registers the product docs toolset in server creation. |
| src/dbt_mcp/config/settings.py | Adds DISABLE_PRODUCT_DOCS and DBT_MCP_ENABLE_PRODUCT_DOCS settings. |
| src/dbt_mcp/config/config.py | Wires product docs into enable/disable toolset mapping. |
| docs/diagram.d2 | Updates architecture diagram to include Product Docs toolset. |
| README.md | Documents the new Product Docs tools in the public tool list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Introduce a new `product_docs` toolset that lets AI agents query the public dbt documentation in real time via two MCP tools: - `search_product_docs` — keyword search against llms.txt with automatic full-text fallback via llms-full.txt - `get_product_doc_pages` — fetch one or more pages as Markdown (up to 10 in parallel) Includes TTL-based in-memory caching, relevance-ranked search with documented scoring weights, abbreviation expansion, and unit tests. Made-with: Cursor
Instead of fetching the entire llms-full.txt corpus when the llms.txt index returns few results, re-run the metadata search with expanded keywords (abbreviations and synonyms). This avoids loading the full docs site while still improving recall for short/abbreviated queries. Made-with: Cursor
93361f7 to
adb5f63
Compare
…URLs - Validate that normalize_doc_url only produces docs.getdbt.com URLs, raising ValueError for external hosts (prevents SSRF). - Share a single ProductDocsClient across tool calls so TTL-based caches (llms.txt index, page cache) actually persist. - Normalize URLs in error responses to match the format used in success responses (display_url instead of raw input path). Made-with: Cursor
1288472 to
b50b7e7
Compare
b-per
left a comment
There was a problem hiding this comment.
Adding a couple more comments now that I saw the current caching approach
Replace TTL-based caching (timestamps, locks, eviction) with a simple dict that lives for the lifetime of the MCP server process. Restart the server to refresh. Removes INDEX_CACHE_TTL_SECONDS, PAGE_CACHE_TTL_SECONDS, FULL_TEXT_CACHE_TTL_SECONDS, and associated asyncio locks. Made-with: Cursor
|
thanks @b-per ! ready for re-review when you can |
b-per
left a comment
There was a problem hiding this comment.
Thanks Mirna. It's good to merge from me. Let's see if we get feedback from the MCP users about it and iterate from there.
|
Your commits are not signed though. Before any future contribution, could you check how to set git to sign your commits? https://github.com/dbt-labs/dbt-mcp/blob/main/CONTRIBUTING.md |
Summary
This PR adds a
product_docstoolset to the dbt MCP server, giving AI agents real-time access to the public dbt documentation at docs.getdbt.com. The toolset exposes two MCP tools with a clean separation of concerns: search for pages, then fetch their content.Product docs tools are included by default when you run dbt-mcp; use
DISABLE_PRODUCT_DOCS=trueto turn them off, orDBT_MCP_ENABLE_PRODUCT_DOCS=truewhen using an allowlist (otherDBT_MCP_ENABLE_*vars).Flow
What changed
Two-tool surface (
src/dbt_mcp/product_docs/tools.py)Designed with minimal tool count to reduce LLM confusion and MCP context overhead:
search_product_docs— keyword search over the llms.txt index (returns metadata: titles, URLs, descriptions). Includes automatic full-text fallback viallms-full.txtwhen keyword search finds fewer than 3 results.get_product_doc_pages— fetch one or more pages by path or URL as Markdown (up to 10 in parallel). A single tool handles both single-page and multi-page fetching — pass a list of one path or many.Typed responses (
src/dbt_mcp/product_docs/types.py)All tools return typed
@dataclassresponses (following the existingsemantic_layer/types.pypattern) instead of raw JSON strings:DocSearchResult,SearchProductDocsResponse,ProductDocPageResponse,GetProductDocPagesResponseFastMCP natively serializes dataclasses, so no manual
json.dumps()is needed.Client with caching and search ranking (
src/dbt_mcp/product_docs/client.py)ProductDocsClienthandles HTTP fetching with TTL-based in-memory caching (1h index, 24h full-text, 30m pages).score_index_entry()with named constants likeSCORE_KEYWORD_IN_TITLE,SCORE_EXACT_TITLE_MATCH, etc.).client.get_page()raiseshttpx.HTTPStatusError/httpx.RequestError— tool layer catches exceptions and returns typed error responses.Supporting changes
tool_names.py/toolsets.py—SEARCH_PRODUCT_DOCSandGET_PRODUCT_DOC_PAGESenum members added.human_descriptions.py— descriptions for both tools..mdfiles —search_product_docs.mdandget_product_doc_pages.mdwith guidance on how to present docs content to users.config.py/settings.py/server.py— product docs toolset registration withDISABLE_PRODUCT_DOCSsupport.README.md/diagram.d2— auto-updated by pre-commit hook..changes/unreleased/— changie entry added.Tests
tests/unit/tools/test_product_docs.py) — 42 tests covering parsing, URL normalization, search ranking, full-text fallback, page fetching (success, 404, network error, partial failures, 10-page cap), and toolset registration.tests/integration/product_docs/test_product_docs.py) — live tests against docs.getdbt.com for client methods and both MCP tools.Why
get_product_doc_pagestool accepts a list of paths (1 to 10), eliminating the need for separate single/batch tools. Fewer tools = less LLM confusion about which to pick, less MCP context overhead.Checklist
Additional notes
Product docs tools are on by default. Set
DISABLE_PRODUCT_DOCS=trueto disable them. If you use an allowlist (e.g.DBT_MCP_ENABLE_SEMANTIC_LAYER=true), addDBT_MCP_ENABLE_PRODUCT_DOCS=trueto include product docs. Theread_only_hint=Trueannotation is correct per the MCP spec — the in-memory cache is internal process state, not an externally observable side effect.