Skip to content

vaadin/vaadin-documentation-services

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Vaadin Documentation RAG Service

A sophisticated, hierarchically-aware Retrieval-Augmented Generation (RAG) system for Vaadin documentation that understands document structure, provides framework-specific filtering, and enables intelligent parent-child navigation through documentation sections.

🎯 Project Overview

This project provides an advanced RAG system with enhanced hybrid search that:

  • Understands Hierarchical Structure: Navigates parent-child relationships within and across documentation files
  • Enhanced Hybrid Search: Combines semantic and intelligent keyword search with native Pinecone reranking for superior relevance
  • Framework Filtering: Intelligently filters content for Vaadin Flow (Java) vs Hilla (React) frameworks
  • Agent-Friendly: Provides MCP (Model Context Protocol) server for seamless IDE assistant integration
  • Production Ready: Clean architecture with dependency injection, comprehensive testing, and error handling

πŸ—οΈ Architecture

vaadin-documentation-services/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ core-types/              # Shared TypeScript interfaces
β”‚   β”œβ”€β”€ 1-asciidoc-converter/    # AsciiDoc β†’ Markdown + metadata extraction
β”‚   β”œβ”€β”€ 2-embedding-generator/   # Markdown β†’ Vector database with hierarchical chunking
β”‚   β”œβ”€β”€ rest-server/             # Enhanced REST API with hybrid search + reranking
β”‚   └── mcp-server/              # MCP server with hierarchical navigation
β”œβ”€β”€ package.json                 # Bun workspace configuration
└── PROJECT_PLAN.md             # Complete project documentation

Data Flow

flowchart TD
    subgraph "Step 1: Documentation Processing"
        VaadinDocs["πŸ“š Vaadin Docs<br/>(AsciiDoc)"] 
        Converter["πŸ”„ AsciiDoc Converter<br/>β€’ Framework detection<br/>β€’ URL generation<br/>β€’ Markdown output"]
        Processor["⚑ Embedding Generator<br/>β€’ Hierarchical chunking<br/>β€’ Parent-child relationships<br/>β€’ OpenAI embeddings"]
    end

    subgraph "Step 2: Enhanced Retrieval"
        Pinecone["πŸ—„οΈ Pinecone Vector DB<br/>β€’ Rich metadata<br/>β€’ Hierarchical relationships<br/>β€’ Framework tags"]
        RestAPI["🌐 REST API<br/>β€’ Enhanced hybrid search<br/>β€’ Native Pinecone reranking<br/>β€’ Framework filtering"]
    end

    subgraph "Step 3: Agent Integration"
        MCP["πŸ€– MCP Server<br/>β€’ search_vaadin_docs<br/>β€’ get_full_document<br/>β€’ Full document retrieval"]
        IDEs["πŸ’» IDE Assistants<br/>β€’ Context-aware search<br/>β€’ Hierarchical exploration<br/>β€’ Framework-specific help"]
    end

    VaadinDocs --> Converter
    Converter --> Processor
    Processor --> Pinecone
    Pinecone <--> RestAPI
    RestAPI <--> MCP
    MCP <--> IDEs

    classDef processing fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef storage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef api fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    classDef agent fill:#fff3e0,stroke:#e65100,stroke-width:2px

    class VaadinDocs,Converter,Processor processing
    class Pinecone,RestAPI storage
    class MCP api
    class IDEs agent
Loading

✨ Key Features

πŸ” Intelligent Search

  • Enhanced Hybrid Search: Combines semantic similarity with intelligent keyword extraction and scoring
  • Native Pinecone Reranking: Uses Pinecone's bge-reranker-v2-m3 for optimal result ranking
  • Framework Awareness: Filters Flow vs Hilla content with common content inclusion
  • Query Preprocessing: Smart keyword extraction with stopword filtering for better search quality

🌳 Hierarchical Navigation

  • Parent-Child Relationships: Navigate from specific details to broader context
  • Cross-File Links: Understand relationships between different documentation files
  • Context Breadcrumbs: Maintain navigation context for better user experience

πŸŽ›οΈ Developer Experience

  • MCP Integration: Standardized protocol for IDE assistant integration
  • TypeScript: Full type safety across all packages
  • Comprehensive Testing: Unit tests, integration tests, and hierarchical workflow validation
  • Clean Architecture: Dependency injection and interface-based design

πŸš€ Quick Start

Prerequisites

  • Bun runtime
  • OpenAI API key (for embeddings)
  • Pinecone API key and index

Installation

# Clone and install dependencies
git clone https://github.com/vaadin/vaadin-documentation-services
cd vaadin-documentation-services
bun install

Environment Setup

# Create .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env
echo "PINECONE_API_KEY=your_pinecone_api_key" >> .env
echo "PINECONE_INDEX=your_pinecone_index" >> .env

Running the System

1. Process Documentation (One-time setup)

# Convert AsciiDoc to Markdown with metadata
cd packages/1-asciidoc-converter
bun run convert

# Generate embeddings and populate vector database
cd ../2-embedding-generator
bun run generate

2. Start REST API Server

cd packages/rest-server
bun run start
# Server runs at http://localhost:3001

3. Use MCP Server with IDE Assistant

The MCP server is deployed and available remotely via HTTP transport at: https://vaadin-mcp.fly.dev/mcp

Configure your IDE assistant to use the Streamable HTTP transport:

import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
  new URL("https://vaadin-mcp.fly.dev/mcp")
);

πŸ“¦ Package Details

Core Types (packages/core-types/)

Shared TypeScript interfaces used across all packages:

  • DocumentChunk: Core documentation chunk structure
  • RetrievalResult: Search result with relevance scoring
  • Framework: Type-safe framework definitions

AsciiDoc Converter (packages/1-asciidoc-converter/)

Converts Vaadin AsciiDoc documentation to Markdown with metadata:

  • Framework Detection: Automatically detects Flow/Hilla/common content
  • URL Generation: Creates proper Vaadin.com documentation links
  • Include Processing: Handles AsciiDoc include directives
  • Metadata Extraction: Preserves semantic information in frontmatter
cd packages/1-asciidoc-converter
bun run convert          # Convert all documentation
bun run test            # Run framework detection tests

Embedding Generator (packages/2-embedding-generator/)

Creates vector embeddings with hierarchical relationships:

  • Hierarchical Chunking: Preserves document structure and relationships
  • Parent-Child Links: Creates cross-file and intra-file relationship mapping
  • LangChain Integration: Uses MarkdownHeaderTextSplitter for intelligent chunking
  • Batch Processing: Efficient embedding generation and Pinecone upsertion
cd packages/2-embedding-generator
bun run generate        # Generate embeddings from Markdown
bun run test           # Run chunking and relationship tests

REST Server (packages/rest-server/)

Enhanced API server with hybrid search capabilities:

  • Hybrid Search: Semantic + keyword search with RRF fusion
  • Framework Filtering: Flow/Hilla/common content filtering
  • Document Navigation: /chunk/:chunkId endpoint for parent-child navigation
  • Backward Compatibility: Maintains existing API contracts
cd packages/rest-server
bun run start          # Start production server
bun run test           # Run comprehensive test suite
bun run test:verbose   # Detailed test output

API Endpoints:

  • POST /search - Hybrid search with framework filtering
  • GET /chunk/:chunkId - Retrieve specific document chunk
  • POST /ask - AI-generated answers (with streaming support)
  • GET /health - Health check
  • GET /vaadin-version - Get latest Vaadin version from GitHub releases

MCP Server (packages/mcp-server/)

Model Context Protocol server for IDE assistant integration:

  • Document Tools: search_vaadin_docs and get_full_document
  • Full Document Retrieval: Complete documentation pages with context
  • Framework Awareness: Intelligent framework detection and filtering
  • Error Handling: Graceful degradation for missing content
cd packages/mcp-server
bun run build          # Build for distribution
bun run test           # Run document-based tests

Available Tools:

  • search_vaadin_docs: Search with semantic and keyword matching
  • get_full_document: Retrieve complete documentation pages
  • get_vaadin_version: Get latest Vaadin version and release timestamp

πŸ§ͺ Testing

Each package includes comprehensive test suites:

# Test individual packages
cd packages/1-asciidoc-converter && bun run test
cd packages/2-embedding-generator && bun run test  
cd packages/rest-server && bun run test
cd packages/mcp-server && bun run test

# Run REST server against live endpoint
cd packages/rest-server && bun run test:server

πŸ“ˆ Performance & Metrics

Search Quality

  • 100% Framework Detection Accuracy: Flow, Hilla, and common content correctly identified
  • Enhanced Hybrid Search: Semantic + keyword search with native Pinecone reranking dramatically improves relevance
  • Contextual Navigation: Parent-child relationships enable better result exploration
  • 4,982 Document Chunks: Complete coverage of 378 Vaadin documentation files with 5-level hierarchy

System Performance

  • Parallel Processing: Semantic and keyword search executed in parallel with intelligent merging
  • Native Reranking: Pinecone's bge-reranker-v2-m3 provides superior result ranking
  • Query Preprocessing: Smart keyword extraction with stopword filtering improves search quality
  • Efficient Chunking: Optimized token limits with intelligent content splitting
  • Clean Architecture: Dependency injection enables easy performance optimization

Production Readiness

  • 100% API Backward Compatibility: All existing integrations continue to work
  • Robust Error Handling: Graceful fallbacks ensure system reliability
  • Fresh Data: Recently updated with complete Vaadin documentation coverage

🌐 Deployment

REST Server (fly.io)

The REST server is deployed to fly.io and available at:

  • Production: https://vaadin-docs-search.fly.dev
  • Health Check: https://vaadin-docs-search.fly.dev/health

MCP Server (fly.io)

The MCP server is deployed to fly.io and available at:

  • Production: https://vaadin-mcp.fly.dev/mcp
  • Health Check: https://vaadin-mcp.fly.dev/health

Documentation Processing

Automated via GitHub Actions:

  • Daily Updates: Documentation re-processed automatically
  • Manual Triggers: Can be triggered via GitHub Actions UI
  • Error Notifications: Automated alerts for processing failures

πŸ”§ Development

Workspace Structure

This project uses Bun workspaces for package management:

bun install           # Install all dependencies
bun run build         # Build all packages
bun run test          # Test all packages

Adding New Features

  1. Core Types: Add interfaces to packages/core-types/
  2. Processing: Extend converters in packages/1-asciidoc-converter/ or packages/2-embedding-generator/
  3. API: Enhance search in packages/rest-server/
  4. Integration: Update MCP tools in packages/mcp-server/

Architecture Principles

  • Single Responsibility: Each package has a clear, focused purpose
  • Interface-Based Design: Clean contracts between components
  • Dependency Injection: Testable and swappable implementations
  • Type Safety: Full TypeScript coverage with strict configuration

πŸ“š Documentation

πŸ† Project Success

This project successfully delivered:

βœ… Sophisticated RAG System: Replaced naive implementation with hierarchically-aware search
βœ… Enhanced User Experience: Agents can now navigate from specific details to broader context
βœ… Production Quality: Clean architecture, comprehensive testing, and error handling
βœ… Framework Intelligence: Accurate Flow/Hilla content separation with common content inclusion
βœ… Developer Integration: Seamless IDE assistant integration via MCP protocol

The system now provides intelligent, context-aware documentation search that understands the hierarchical structure of Vaadin documentation and enables sophisticated agent interactions.

πŸ“„ License

MIT - See license file for details.


Built with ❀️ for the Vaadin developer community

About

A complete solution for ingesting, indexing, and retrieving Vaadin documentation through semantic search.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •