Skip to content

Modular LangChain 'Chat With Your Data' tutorial implementation. Each step is a separate script for learning and experimentation.

Notifications You must be signed in to change notification settings

lcmartinezdev/langchain-basics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LangChain: Chat With Your Data (Local Project)

Security Warning: Never share your .env file or API keys. The .env file is gitignored by default, and sensitive credentials should always be kept private.

This project is a modular, well-documented implementation of the LangChain "Chat With Your Data" tutorial. Each step is a separate script, so you can learn and experiment with each concept locally.

Topics Covered

  1. Document Loading: Load data from PDFs, web pages, and (optionally) YouTube. See src/load_documents.py.
  2. Text Splitting: Break documents into manageable chunks using different splitters. See src/split_text.py.
  3. Embeddings: Convert text to vector representations and compare semantic similarity. See src/embeddings.py.
  4. Vector Stores: Store and retrieve document embeddings efficiently with Chroma. See src/vector_store.py.
  5. Question Answering: Build QA chains to answer questions about your documents, with a custom prompt. See src/qa_chain.py.

Prerequisites

  • Python 3.9+
  • An OpenAI API key (add to .env)
  • (Optional) A PDF file at data/test.pdf for PDF loading demo
  • LangChain v0.1+ and [langchain_community]

Setup

  1. Clone this repo and cd into it.
  2. Copy .env.example to .env and add your OpenAI API key and any other required environment variables.
  3. (Optional) Place a PDF at data/test.pdf for PDF loading.
  4. Install dependencies:
    pip install -r requirements.txt
  5. (Recommended) Install and run ruff for linting and uv for dependency management:
    pip install ruff uv
    ruff check src/
    # (Optional) Compile requirements.txt from requirements.in
    uv pip compile requirements.in --output-file requirements.txt

Workflow: How to Run

Scripts must be run in order, as each step saves output for the next. All scripts use utils.py to load environment variables from .env (using python-dotenv).

  1. python src/load_documents.py # Load and preview documents (saves pickles/docs.pkl)
    • Loads PDF and web documents, prints a preview.
  2. python src/split_text.py # Split documents into chunks (saves pickles/splits.pkl)
    • Splits documents using RecursiveCharacterTextSplitter and saves the result.
  3. python src/embeddings.py # Generate and compare embeddings
    • Generates OpenAI embeddings, compares semantic similarity, and embeds document chunks.
  4. python src/vector_store.py # Create and query a vector store (saves to database/)
    • Loads splits, creates a Chroma vector store, runs a sample query, and persists the DB automatically in the database/ directory.
  5. python src/qa_chain.py # Run a question-answering chain
    • Loads the Chroma vector store from database/, sets up a custom prompt, and answers a sample question using a RetrievalQA chain.

Each script is commented for learning. See the source for details and experiment with your own data!

Data & Outputs

  • Intermediate outputs are saved in the pickles/ directory (e.g., docs.pkl, splits.pkl).
  • Persistent vector store is saved in the database/ directory (Chroma DB and related files).
  • Both pickles/ and database/ are gitignored and safe to delete if you want to reset the workflow.

Customization

  • Prompts: You can edit the prompt in src/qa_chain.py to change the style or constraints of the answers.
  • Document Sources: Add more loaders in src/load_documents.py as needed (see LangChain docs for options).

Troubleshooting

  • If you see OPENAI_API_KEY not set in .env file., check your .env file.
  • If you get file not found errors, ensure you ran the previous step and the required files exist.
  • For PDF loading, make sure data/test.pdf exists.
  • Chroma DB is automatically persisted on any change (no need to call persist manually).

Inspired by DeepLearning.AI - LangChain Chat With Your Data

About

Modular LangChain 'Chat With Your Data' tutorial implementation. Each step is a separate script for learning and experimentation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages