Winner of the "Implémentation d'un agent conversationnel basé sur un système RAG pour un site d'information médicale sur les maladies neurologiques (SEP, Parkinson, Alzheimer, AVC) Challenge" by ARSII at TWISE Night Challenge!
This project is a fork of PD_RAG_Conversational, originally created as a team hackathon project. NeuroChatRAG is a Retrieval-Augmented Generation (RAG) conversational AI system designed to provide accurate and context-aware responses about neurological conditions including Multiple Sclerosis, Parkinson's, Alzheimer's, and Stroke.
- Transparent Information Delivery: View both AI answers and the retrieved medical context
- Adjustable Complexity: Toggle between simple explanations and technical medical responses
- RAG Architecture: Enhanced response accuracy through retrieval-augmented generation
- Comprehensive Knowledge Base: Leverages 9,000+ medical articles from PubMed
- User-Friendly Interface: Streamlit-based interface for easy interaction
- Data Collection: Automated scraping of PubMed articles related to neurological conditions
- Vector Database: Efficient storage and retrieval of medical knowledge using FAISS
- RAG Pipeline: Sophisticated retrieval system to augment LLM responses with accurate context
- NLP Techniques: Domain-specific medical embeddings with PubMedBERT for accurate interpretation of medical queries
- Modular Architecture: Scalable design for future extensions
Ensure you have the following installed:
- Python 3.8+
- pip
- Virtual environment (optional but recommended)
- Clone the repository:
git clone https://github.com/YOUR_USERNAME/NeuroChatRAG.git
cd NeuroChatRAG- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`- Install dependencies:
pip install -r requirements.txt- Add your OpenAI API key to
.envfile:
OPENAI_API_KEY = "YOUR_API_KEY_HERE"
- Run the application:
streamlit run streamlit_app.py- Fetches relevant medical articles from PubMed
- Downloads clinical guidelines from authoritative sources
- Creates organized directory structure for data storage
- Cleans and formats medical abstracts
- Splits texts into optimized chunks for retrieval
- Implements robust error handling for processing large datasets
- Integrates domain-specific medical embeddings (PubMedBERT)
- Implements retrieval mechanisms with k=5 for comprehensive context
- Supports both technical and simplified language styles
- Built with LangChain components for flexibility and extensibility
The application provides a conversational interface where users can:
- Ask questions about neurological conditions
- View both the AI-generated answer and the source medical context
- Adjust the technical level of responses based on their background
The system uses a dataset of approximately 9,000 articles from PubMed, focused on neurological conditions. The scraping scripts and data processing pipeline are included in the repository for transparency and reproducibility.
- Expansion to additional neurological conditions
- Integration with medical imaging analysis
- Multi-language support for global accessibility
- Mobile application development
This project is licensed under the MIT License - see the LICENSE file for details.
- ARSII for organizing the challenge
- TWISE Night Challenge for the platform and recognition
- Original contributors to the PD_RAG_Conversational project
- PubMed for providing access to valuable medical literature

