This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a local movie recommendation system built with RAG (Retrieval-Augmented Generation) architecture using:
- LangChain for orchestration
- LLaMA 3 via Ollama for local LLM inference
- FAISS for vector similarity search
- Sentence Transformers (
all-MiniLM-L6-v2) for embeddings - IMDb dataset as the knowledge base
The system operates in two main phases:
- Data Preparation: Scripts in
Scripts/directory clean and prepare IMDb datadat_prep_genres.py: Cleans and normalizes genre datadata_prep_poster.py: Fetches poster URLs via OMDB API
- Vectorization:
recommender.ipynbcreates embeddings and FAISS index- Combines Title, Director, Genres, Star Cast, and plot into unified text
- Generates embeddings using sentence-transformers
- Stores in FAISS vector store (
faiss_imdb_store/)
- Retrieval: User queries are embedded and matched against FAISS index
- Generation: Retrieved movie contexts are passed to LLaMA 3 for conversational responses
- Response: Natural language recommendations based on retrieved data
pip install pandas numpy langchain langchain-community \
scikit-learn matplotlib ipykernel sentence-transformers \
faiss-cpu langchain-ollama- Ollama: Must be installed and running locally
- LLaMA 3 model: Pull with
ollama pull llama3 - OMDB API Key: Required for poster data (stored in
OMDB_API_KEYenv var)
IMDb_Dataset_Composite_Cleaned.csv: Main dataset with movie metadataIMDb_Title_Description.csv: Processed title/description pairs for vectorizationfaiss_imdb_store/: FAISS vector index directoryfaiss_imdb_store.zip: Compressed backup of vector store
# Process IMDb data
python -m src.rag_movie_rec.cli process
# Build vector store
python -m src.rag_movie_rec.cli build
# Interactive queries
python -m src.rag_movie_rec.cli query --interactive
# Health check
python -m src.rag_movie_rec.cli health
# Run tests
pytest tests/ -v# Start development environment
docker-compose -f docker/docker-compose.yml --profile dev up -d
# Access Jupyter Lab
open http://localhost:8888
# Run tests in container
docker exec rag-movie-rec pytest tests/- Data Updates: Modify
src/rag_movie_rec/data_processor.pyfor preprocessing changes - Vector Store: Delete
faiss_imdb_store/directory to trigger recreation - Query Testing: Use CLI interactive mode or Streamlit UI
- Model Changes: Update Ollama model via environment variables
- Vector store creation is expensive - existing stores are preserved
- All LLM inference runs locally via Ollama (no API keys needed)
- The system expects specific CSV column names:
Title,Director,Genres,Star Cast,plot - FAISS deserialization requires
allow_dangerous_deserialization=True