EduRAG is a Retrieval-Augmented Generation (RAG) based AI system designed to generate accurate, context-aware answers from structured educational content.
It improves traditional question-answering by combining semantic search with LLM-based response generation.
- Retrieval-Augmented Generation (RAG) pipeline for improved answer accuracy
- Custom chunking and chunk-merging strategy to enhance context quality
- Embedding-based semantic search for relevant content retrieval
- Multi-file JSON data processing and structuring
- Context-aware response generation with reduced noise
- Raw data is preprocessed and divided into chunks
- Chunks are intelligently merged to improve context
- Embeddings are created for semantic understanding
- Top-k relevant chunks are retrieved based on query
- LLM generates a final answer using retrieved context
- Python
- JSON Data Processing
- Embedding-based Semantic Search
- Retrieval-Augmented Generation (RAG)
. ├── merge_chunks.py ├── preprocess_json.py ├── processing_query.py ├── mp3_to_json.py ├── video_to_mp3.py ├── README.md
- Designed a custom chunk-merging mechanism to reduce context fragmentation
- Improved answer quality by optimizing chunk size and grouping strategy
- Built modular scripts for scalable data preprocessing and retrieval
- Focused on enhancing LLM performance through better context handling
- Convert video content to audio using
video_to_mp3.py - Transcribe audio to structured JSON using
mp3_to_json.py - Preprocess raw data using
preprocess_json.py - Merge chunks for improved context using
merge_chunks.py - Perform query processing and retrieval using
processing_query.py