Skip to content

Latest commit

 

History

History
79 lines (60 loc) · 3.67 KB

File metadata and controls

79 lines (60 loc) · 3.67 KB

Latent Chunk Lab

Latent Chunk Lab is an educational playground for exploring different RAG (Retrieval-Augmented Generation) chunking techniques. It demonstrates how to split documents into chunks, embed them, store/retrieve efficiently, and finally query them using local Ollama models or Gemini API.

Features

  • Compare Chunking Strategies: Experiment with character, token, and recursive chunking methods.
  • Flexible Embedding Models:
    • Local: Use open-source models like gemma or llama3 via Ollama.
    • Cloud: Integrate with Google's Gemini API for powerful embeddings.
  • Persistent Vector Stores:
    • FAISS: CPU-based library for efficient similarity search, with indexes saved to disk.
    • ChromaDB: Open-source embedding database that persists to disk.
  • Versatile LLMs: Query your data using local models with Ollama or leverage the Gemini API.
  • Dual-Mode Evaluation:
    • Cloud-based: Assess RAG performance with Ragas, measuring metrics like faithfulness, answer_relevancy, and context_precision.
    • Local: Perform simple exact match evaluation when using local models.

How It Works

The RAG pipeline follows these steps:

  1. Load Document: A document (e.g., PDF, TXT) is loaded from the data directory.
  2. Chunking: The document is split into smaller chunks based on the selected strategy in the configuration.
  3. Embedding: Each chunk is converted into a vector embedding using either Ollama or the Gemini API.
  4. Indexing: The embeddings are stored in a vector store (FAISS or ChromaDB). The index is persisted to the vectorstore directory in the project root, so it doesn't have to be rebuilt every time.
  5. Querying: When a user asks a question, the query is embedded, and the vector store retrieves the most relevant chunks.
  6. Generation: The retrieved chunks and the original query are passed to an LLM (Ollama or Gemini) to generate a final answer.
  7. Evaluation: The generated answer is evaluated against a reference answer. For cloud LLMs, ragas is used for a detailed evaluation. For local LLMs, a simple exact match is performed.

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/latent-chunk-lab.git
    cd latent-chunk-lab
  2. Create a virtual environment:

    python -m venv .venv
    source .venv/bin/activate

    On Windows, use .venv\Scripts\activate

  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up API Keys: If you plan to use the Gemini API, you need to set up an API key.

    • Create a .env file in the root directory.
    • Add your Gemini API key to the .env file:
      GEMINI_API_KEY="your_api_key_here"
      

Usage

The primary way to use this lab is through the rag_pipeline_demo.ipynb notebook located in the notebooks directory.

  1. Start Jupyter Lab:

    jupyter lab
  2. Open the notebook: Navigate to notebooks/rag_pipeline_demo.ipynb.

  3. Select a Configuration: In the notebook, you can choose which configuration to use:

    • configs/ollama.yaml: Uses local Ollama models for embeddings and LLM, with FAISS as the vector store.
    • configs/gemini.yaml: Uses the Gemini API for embeddings and LLM, with ChromaDB as the vector store.
  4. Run the cells: Execute the notebook cells to see the entire RAG pipeline in action, from chunking to evaluation. The first time you run it with a new configuration, the vector store will be built and saved in the project's root directory. Subsequent runs will load the index from disk.