Skip to content

AdityaGaur7/WebScrap-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

README.md

Screenshot 2025-09-05 230306

RAG Pipeline with LangChain and Google Generative AI

This project demonstrates a Retrieval-Augmented Generation (RAG) pipeline using LangChain, ChromaDB, and Google Generative AI. It scrapes course content from a website, splits the text, embeds it, and enables question-answering over the retrieved context.

Features

  • Web scraping using LangChain Community's WebBaseLoader
  • Document splitting with RecursiveCharacterTextSplitter
  • Embedding generation via Google Generative AI
  • Vector storage and retrieval using ChromaDB
  • RAG pipeline for concise question answering
  • Prompt debugging with custom print function

Setup

  1. Install dependencies:

    pip install langchain_community langchainhub chromadb langchain langchain-google-genai langchain-openai
  2. Set your Google API key:

    • If using Google Colab, store your key in Colab's userdata.
    • Otherwise, set GOOGLE_API_KEY in your environment.

Usage

Open RAG.ipynb and run the cells sequentially:

Example

rag_chain.invoke("Is there any free courses?")

Project Structure

  • RAG.ipynb: Main notebook containing all code for scraping, embedding, and RAG pipeline.

License

This project is for

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published