This project demonstrates a Retrieval-Augmented Generation (RAG) pipeline using LangChain, ChromaDB, and Google Generative AI. It scrapes course content from a website, splits the text, embeds it, and enables question-answering over the retrieved context.
- Web scraping using LangChain Community's
WebBaseLoader - Document splitting with
RecursiveCharacterTextSplitter - Embedding generation via Google Generative AI
- Vector storage and retrieval using ChromaDB
- RAG pipeline for concise question answering
- Prompt debugging with custom print function
-
Install dependencies:
pip install langchain_community langchainhub chromadb langchain langchain-google-genai langchain-openai
-
Set your Google API key:
- If using Google Colab, store your key in Colab's
userdata. - Otherwise, set
GOOGLE_API_KEYin your environment.
- If using Google Colab, store your key in Colab's
Open RAG.ipynb and run the cells sequentially:
- Scrape course data from https://www.educosys.com/course/genai
- Split and embed documents
- Store embeddings in ChromaDB
- Query the RAG pipeline with your questions
rag_chain.invoke("Is there any free courses?")- RAG.ipynb: Main notebook containing all code for scraping, embedding, and RAG pipeline.
This project is for