LangChain: Chat With Your Data (Local Project)

Security Warning: Never share your .env file or API keys. The .env file is gitignored by default, and sensitive credentials should always be kept private.

This project is a modular, well-documented implementation of the LangChain "Chat With Your Data" tutorial. Each step is a separate script, so you can learn and experiment with each concept locally.

Topics Covered

Document Loading: Load data from PDFs, web pages, and (optionally) YouTube. See src/load_documents.py.
Text Splitting: Break documents into manageable chunks using different splitters. See src/split_text.py.
Embeddings: Convert text to vector representations and compare semantic similarity. See src/embeddings.py.
Vector Stores: Store and retrieve document embeddings efficiently with Chroma. See src/vector_store.py.
Question Answering: Build QA chains to answer questions about your documents, with a custom prompt. See src/qa_chain.py.

Prerequisites

Python 3.9+
An OpenAI API key (add to .env)
(Optional) A PDF file at data/test.pdf for PDF loading demo
LangChain v0.1+ and [langchain_community]

Setup

Clone this repo and cd into it.
Copy .env.example to .env and add your OpenAI API key and any other required environment variables.
(Optional) Place a PDF at data/test.pdf for PDF loading.
Install dependencies:
```
pip install -r requirements.txt
```

(Recommended) Install and run ruff for linting and uv for dependency management:

pip install ruff uv
ruff check src/
# (Optional) Compile requirements.txt from requirements.in
uv pip compile requirements.in --output-file requirements.txt

Workflow: How to Run

Scripts must be run in order, as each step saves output for the next. All scripts use utils.py to load environment variables from .env (using python-dotenv).

python src/load_documents.py # Load and preview documents (saves pickles/docs.pkl)
- Loads PDF and web documents, prints a preview.
python src/split_text.py # Split documents into chunks (saves pickles/splits.pkl)
- Splits documents using RecursiveCharacterTextSplitter and saves the result.
python src/embeddings.py # Generate and compare embeddings
- Generates OpenAI embeddings, compares semantic similarity, and embeds document chunks.
python src/vector_store.py # Create and query a vector store (saves to database/)
- Loads splits, creates a Chroma vector store, runs a sample query, and persists the DB automatically in the database/ directory.
python src/qa_chain.py # Run a question-answering chain
- Loads the Chroma vector store from database/, sets up a custom prompt, and answers a sample question using a RetrievalQA chain.

Each script is commented for learning. See the source for details and experiment with your own data!

Data & Outputs

Intermediate outputs are saved in the pickles/ directory (e.g., docs.pkl, splits.pkl).
Persistent vector store is saved in the database/ directory (Chroma DB and related files).
Both pickles/ and database/ are gitignored and safe to delete if you want to reset the workflow.

Customization

Prompts: You can edit the prompt in src/qa_chain.py to change the style or constraints of the answers.
Document Sources: Add more loaders in src/load_documents.py as needed (see LangChain docs for options).

Troubleshooting

If you see OPENAI_API_KEY not set in .env file., check your .env file.
If you get file not found errors, ensure you ran the previous step and the required files exist.
For PDF loading, make sure data/test.pdf exists.
Chroma DB is automatically persisted on any change (no need to call persist manually).

Inspired by DeepLearning.AI - LangChain Chat With Your Data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangChain: Chat With Your Data (Local Project)

Topics Covered

Prerequisites

Setup

Workflow: How to Run

Data & Outputs

Customization

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt

lcmartinezdev/langchain-basics

Folders and files

Latest commit

History

Repository files navigation

LangChain: Chat With Your Data (Local Project)

Topics Covered

Prerequisites

Setup

Workflow: How to Run

Data & Outputs

Customization

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages