This AI-powered chatbot performs custom deep research on uploaded documents using a semantic chunking strategy for precise and meaningful vectorization. Through multi-agent collaboration, it delivers accurate, context-aware answers to user queries.
Built with FastAPI, Azure OpenAI, and Chainlit, the system showcases advanced techniques for enhancing LLM-based applicationsβsuch as agentic patterns, modular architecture, multi-agent orchestration, and evaluation support.
At its core, the multi-agent deep research engine combines Microsoft Agent Framework and Semantic Kernel to generate high-quality analytical reports. By employing group chat coordination and a magnetic multi-agent pattern, it achieves deeper reasoning and consistent, well-structured outputs.
- The chatbot now incorporates MS Agent Framework, Microsoft's an open-source SDK and runtime designed to let developers build, deploy, and manage sophisticated multi-agent systems with ease. It unifies the enterprise-ready foundations of Semantic Kernel with the innovative orchestration of AutoGen, so teams no longer have to choose between experimentation and production.
- The chatbot now incorporates Semantic Kernel, Microsoft's open-source orchestration SDK for LLM apps.
- Enables more intelligent planning and contextual understanding, resulting in richer, more accurate responses.
- Supports planner-based execution and native function calling for complex multi-step tasks.
- Introduced verbose mode for improved debugging and traceability.
- Logs include:
- Raw input/output data
- API call history
- Function invocation details
- Helps track down issues and optimize prompt behavior.
- Now supports the following UI framework:
- Chainlit β great for interactive prototyping
- A module that reformulates user queries to improve response quality and informativeness.
- Helps the LLM better understand the user's intent and generate more accurate, context-aware answers.
- Implements planning techniques to enrich search keywords based on the original query context.
- Automatically decomposes complex questions into sub-queries, searches them, and returns synthesized context to the chatbot.
- Boosts performance in multi-intent or multi-hop question scenarios.
This project implements sophisticated multi-agent collaboration patterns such as Group Chat, Magentic patterns using Microsoft Agent Framework, enabling intelligent coordination between specialized AI agents for complex research tasks.
Sequential turn-based collaboration where agents refine outputs through iterative dialogue.
- Architecture: Writer β Reviewer loop with approval-based termination
- Agents:
ResearchWriter: Generates comprehensive research contentResearchReviewer: Validates quality, accuracy, and citation integrity
- Best For:
- Iterative content refinement
- Quality assurance workflows
- Approval-based processes
- Performance: β‘ Fast | π° Medium tokens | ββββ Quality
Usage:
orchestrator = PlanSearchOrchestratorAFW(settings)
async for chunk in orchestrator.generate_response(
messages=messages,
research=True,
multi_agent_type="MS Agent Framework GroupChat",
stream=True
):
print(chunk, end="")Intelligent orchestration with a manager agent coordinating specialized agents adaptively.
- Architecture: Orchestrator β Dynamic agent coordination β Adaptive execution
- Agents:
Orchestrator: Intelligent planning and task decompositionResearchAnalyst: Information synthesis and pattern identificationResearchWriter: Comprehensive content generation with citationsResearchReviewer: Quality validation and scoring
- Best For:
- Complex multi-step research tasks
- Dynamic task decomposition
- Adaptive problem-solving requiring different expertise
- Performance: π’ Medium speed | π°π° Higher tokens | βββββ Excellent quality
Usage:
orchestrator = PlanSearchOrchestratorAFW(settings)
async for chunk in orchestrator.generate_response(
messages=messages,
research=True,
multi_agent_type="MS Agent Framework Magentic",
stream=True
):
print(chunk, end="")| Aspect | Group Chat | Magentic Orchestration |
|---|---|---|
| Execution | Sequential dialogue | Intelligent orchestration |
| Planning | None (fixed workflow) | Built-in adaptive planning |
| Agent Coordination | Turn-based | Dynamic by orchestrator |
| Rounds | 3-5 fixed iterations | 1-5+ adaptive rounds |
| Speed | β‘ Fast | π’ Medium |
| Token Usage | π° Medium | π°π° High |
| Quality | ββββ | βββββ |
| Best For | Refinement workflows | Complex multi-step tasks |
Use Group Chat when:
- β You need iterative refinement with clear review cycles
- β Speed is important
- β Fixed writer-reviewer workflow is sufficient
- β Lower token consumption is preferred
Use Magentic Orchestration when:
- β Research requires multi-step analysis and synthesis
- β Complex task decomposition is needed
- β Adaptive coordination provides value
- β Quality is prioritized over speed
- β Tasks require different types of expertise
Both patterns are fully integrated into the orchestration workflow:
User Query β Intent Analysis β Search Planning β Multi-Source Search
β
(Web + AI Search + YouTube)
β
βββββββββββββββββββββββββ
β Multi-Agent Pattern β
β β
β β’ Group Chat β
β β’ Magentic β
βββββββββββββ¬ββββββββββββ
β
Streaming Markdown Output
Key Features:
- π Streaming Support: Real-time progress updates and token-by-token streaming
- π Context Integration: Seamless integration with Web Search, AI Search, and YouTube contexts
- π― Sub-topic Processing: Parallel processing of multiple research sub-topics
- β‘ TTFT Tracking: Time-to-first-token monitoring for performance optimization
- π‘οΈ Error Handling: Robust error handling with graceful degradation
- π Citation Management: Automatic source attribution and reference tracking
The project is organized into two main parts:
backend: Contains the FastAPI server and all backend functionalityfrontend: Contains the frontend UI
- Python 3.9 or higher
- uv package manager
- Azure subscription with OpenAI service enabled
- uv
uv venv .venv --python 3.12 --seed
source .venv/bin/activate-
Clone the repository:
git clone https://github.com/yourusername/multi-agent-doc-research.git cd multi-agent-doc-research/app/backend -
Install backend dependencies using uv:
uv pip install -e .For development dependencies:
uv pip install -e ".[dev]" -
Set up environment variables:
cp .env.example .env
Then edit the
.envfile and add your Azure OpenAI credentials:# Azure OpenAI Configuration AZURE_OPENAI_API_KEY=your-api-key-here AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/ AZURE_OPENAI_API_VERSION=2023-05-15 AZURE_OPENAI_DEPLOYMENT_NAME=your-deployment-name AZURE_OPENAI_QUERY_DEPLOYMENT_NAME=your-query-deployment-name AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME=your-embedding-deployment-name # Redis Configuration REDIS_USE=False REDIS_HOST=localhost REDIS_PORT=6379 REDIS_PASSWORD=redis_secure_password REDIS_DB=0 REDIS_CACHE_EXPIRED_SECOND=604800 # Application Settings LOG_LEVEL=INFO MAX_TOKENS=2000 DEFAULT_TEMPERATURE=0.7 # When you use the Bing Custom Search API, you need to set the custom configuration ID. # Planner Settings PLANNER_MAX_PLANS=3 # AI Search AZURE_AI_SEARCH_ENDPOINT=https://your-search-service.search.windows.net AZURE_AI_SEARCH_API_KEY=your-search-service-api-key AZURE_AI_SEARCH_INDEX_NAME=doc_inquiry_index AZURE_AI_SEARCH_SEARCH_TYPE=semantic # Options: "semantic", "simple", "hybrid" # Document Intelligence AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-cognitive-services-account.cognitiveservices.azure.com/ AZURE_DOCUMENT_INTELLIGENCE_API_KEY=your-document-intelligence-api-key # Chunking Method # Use "semantic" for semantic chunking, "page" for page-based chunking PROCESSING_METHOD=semantic
Start the FastAPI server:
uv run run.pyThe API will be available at:
- API: http://localhost:8000
- Documentation: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc
Run the application:
./run_app.sh- Open your web browser and navigate to public URL
http://localhost:7860/to access the Chainlit interface. - Upload documents using the "Upload" button.
- Enter your message in the input box and click "Submit" to interact with the chatbot.
Feel free to submit issues or pull requests if you have suggestions or improvements for the project.
This project is licensed under the MIT License. See the LICENSE file for more details.
