Latent Chunk Lab

Latent Chunk Lab is an educational playground for exploring different RAG (Retrieval-Augmented Generation) chunking techniques. It demonstrates how to split documents into chunks, embed them, store/retrieve efficiently, and finally query them using local Ollama models or Gemini API.

Features

Compare Chunking Strategies: Experiment with character, token, and recursive chunking methods.
Flexible Embedding Models:
- Local: Use open-source models like gemma or llama3 via Ollama.
- Cloud: Integrate with Google's Gemini API for powerful embeddings.
Persistent Vector Stores:
- FAISS: CPU-based library for efficient similarity search, with indexes saved to disk.
- ChromaDB: Open-source embedding database that persists to disk.
Versatile LLMs: Query your data using local models with Ollama or leverage the Gemini API.
Dual-Mode Evaluation:
- Cloud-based: Assess RAG performance with Ragas, measuring metrics like faithfulness, answer_relevancy, and context_precision.
- Local: Perform simple exact match evaluation when using local models.

How It Works

The RAG pipeline follows these steps:

Load Document: A document (e.g., PDF, TXT) is loaded from the data directory.
Chunking: The document is split into smaller chunks based on the selected strategy in the configuration.
Embedding: Each chunk is converted into a vector embedding using either Ollama or the Gemini API.
Indexing: The embeddings are stored in a vector store (FAISS or ChromaDB). The index is persisted to the vectorstore directory in the project root, so it doesn't have to be rebuilt every time.
Querying: When a user asks a question, the query is embedded, and the vector store retrieves the most relevant chunks.
Generation: The retrieved chunks and the original query are passed to an LLM (Ollama or Gemini) to generate a final answer.
Evaluation: The generated answer is evaluated against a reference answer. For cloud LLMs, ragas is used for a detailed evaluation. For local LLMs, a simple exact match is performed.

Installation

Clone the repository:

git clone https://github.com/your-username/latent-chunk-lab.git
cd latent-chunk-lab

Create a virtual environment:
```
python -m venv .venv
source .venv/bin/activate
```
On Windows, use .venv\Scripts\activate
Install dependencies:
```
pip install -r requirements.txt
```
Set up API Keys: If you plan to use the Gemini API, you need to set up an API key.
- Create a .env file in the root directory.
- Add your Gemini API key to the .env file:
```
GEMINI_API_KEY="your_api_key_here"
```

Usage

The primary way to use this lab is through the rag_pipeline_demo.ipynb notebook located in the notebooks directory.

Start Jupyter Lab:
```
jupyter lab
```
Open the notebook: Navigate to notebooks/rag_pipeline_demo.ipynb.
Select a Configuration: In the notebook, you can choose which configuration to use:
- configs/ollama.yaml: Uses local Ollama models for embeddings and LLM, with FAISS as the vector store.
- configs/gemini.yaml: Uses the Gemini API for embeddings and LLM, with ChromaDB as the vector store.
Run the cells: Execute the notebook cells to see the entire RAG pipeline in action, from chunking to evaluation. The first time you run it with a new configuration, the vector store will be built and saved in the project's root directory. Subsequent runs will load the index from disk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latent Chunk Lab

Features

How It Works

Installation

Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Latent Chunk Lab

Features

How It Works

Installation

Usage