Skip to content

feat(mgc): add semantic search for Magalu docs#21

Merged
Joao208 merged 4 commits into
mainfrom
joaobarros-/-mgc-magalu-docs-semantic-search
Mar 19, 2026
Merged

feat(mgc): add semantic search for Magalu docs#21
Joao208 merged 4 commits into
mainfrom
joaobarros-/-mgc-magalu-docs-semantic-search

Conversation

@Joao208

@Joao208 Joao208 commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Descricao

Adiciona busca semântica na documentação da Magalu Cloud dentro do MCP mgc existente. Usa TF-IDF para indexar markdowns scraped via docusaurus-to-md e expõe duas novas tools: search_magalu_docs e get_magalu_doc.

Etiquetas (Labels)

  • Nova Funcionalidade
  • Correcao de Bug
  • Estrutura
  • Testes
  • Outros

Historia Relacionada

N/A

Motivacao e Contexto

Precisamos de uma forma de consultar a documentação da Magalu Cloud diretamente via MCP, permitindo que agentes encontrem informações relevantes sem sair do fluxo de trabalho. A busca semântica com TF-IDF é leve, local e não depende de APIs externas de embeddings.

Como Isso Foi Testado?

  • Testes Unitarios
  • Testes de Integracao
  • Testes e2e (playwright)
  • Testes de Aceitacao (QA)
  • Testes de Performance
  • Outros (quais?)
  • Nenhum (por que?) — Build compilou sem erros. Testes manuais pendentes após scrape da doc.

Analise de Risco e Impacto

  • Baixo
  • Alto

Capturas de Tela ou Auxilios Visuais (se apropriado)

N/A

Summary by CodeRabbit

Release Notes

  • New Features

    • Added documentation search functionality to query Magalu developer documentation semantically
    • New search_magalu_docs tool to find relevant documentation and get_magalu_doc tool to retrieve full document content
    • Introduced MAGALU_DOCS_DIR environment variable to enable local docs searching
  • Documentation

    • Updated README with configuration details and tool usage instructions
  • Chores

    • Added docs-cache directory to gitignore

- Add DocsIndex with TF-IDF based semantic search
- Add search_magalu_docs and get_magalu_doc tools
- Add scrape-docs script using docusaurus-to-md
- Configurable via MAGALU_DOCS_DIR env var
@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@Joao208 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66d027a2-f5c3-4d17-a5e7-b9cefe8cf6c1

📥 Commits

Reviewing files that changed from the base of the PR and between ec3d2ec and af64965.

📒 Files selected for processing (2)
  • packages/mgc/README.md
  • packages/mgc/src/docs-index.ts
📝 Walkthrough

Walkthrough

Adds documentation search functionality to the MCP server by introducing a new DocsIndex class for indexing markdown files with TF/IDF scoring, two new MCP tools (search_magalu_docs and get_magalu_doc), and supporting configuration, types, and server integration.

Changes

Cohort / File(s) Summary
Configuration & Documentation
packages/mgc/.gitignore, packages/mgc/README.md
Added docs-cache/ to gitignore. README expanded with new MAGALU_DOCS_DIR environment variable, "Documentation Search" tool descriptions, and "Scraping docs" instructions for generating local documentation cache.
Type Definitions
packages/mgc/src/types.ts
Added SearchDocsParamsSchema (query string, optional max_results) and GetDocParamsSchema (filepath string) with corresponding inferred TypeScript types for MCP tool parameter validation.
Core Search Infrastructure
packages/mgc/src/docs-index.ts
New DocsIndex class that loads markdown files from a directory, extracts titles and content, builds a corpus-wide IDF map, and provides TF/IDF-based semantic search with title-match boosting and snippet extraction. Includes manifest-based URL mapping and error resilience.
Server Integration
packages/mgc/src/server.ts, packages/mgc/src/tools.ts
MgcTools now accepts optional docsDir parameter and initializes DocsIndex. Added searchDocs() and getDoc() methods with error handling. MgcMCPServer registers two new tools with validated input schemas and error responses when docs directory is not configured.

Sequence Diagram

sequenceDiagram
    participant Client
    participant MgcMCPServer as MCP Server
    participant MgcTools
    participant DocsIndex
    participant FileSystem as File<br/>System

    Client->>MgcMCPServer: Call search_magalu_docs<br/>(query, max_results)
    MgcMCPServer->>MgcMCPServer: Validate params<br/>via schema
    MgcMCPServer->>MgcTools: searchDocs(validated)
    MgcTools->>DocsIndex: load() [idempotent]
    DocsIndex->>FileSystem: Discover .md files<br/>in docsDir
    FileSystem-->>DocsIndex: File list
    DocsIndex->>FileSystem: Read markdown<br/>content & manifest
    FileSystem-->>DocsIndex: Content & metadata
    DocsIndex->>DocsIndex: Extract titles,<br/>tokenize, compute IDF
    DocsIndex-->>MgcTools: Index loaded
    MgcTools->>DocsIndex: search(query,<br/>maxResults)
    DocsIndex->>DocsIndex: TF/IDF score<br/>& rank results
    DocsIndex-->>MgcTools: Ranked results<br/>w/ snippets
    MgcTools-->>MgcMCPServer: McpToolResult<br/>(results, metadata)
    MgcMCPServer-->>Client: Search results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Whiskers twitch as indices grow,
TF/IDF makes knowledge flow,
Through Magalu's docs we leap and bound,
Semantic search—enlightenment found!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding semantic search functionality for Magalu documentation to the mgc package.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch joaobarros-/-mgc-magalu-docs-semantic-search
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Joao208 Joao208 merged commit f3153ad into main Mar 19, 2026
2 checks passed
@Joao208 Joao208 deleted the joaobarros-/-mgc-magalu-docs-semantic-search branch March 19, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant