Skip to content

[FEAT] - RAG Knowledge Base R&D & Evaluation #53

@Jrodrigo06

Description

@Jrodrigo06

name: Feature Request
title: "[FEAT] - RAG Knowledge Base R&D & Evaluation"
labels: feature, backlog
assignees: @Jrodrigo06 @mdeekshita @SHarg9876

Summary

Research and evaluate different RAG configurations — chunking strategies, embedding models, and retrieval approaches — to determine the optimal setup for the ingredient suggestion pipeline. Produce metrics and visualizations to back decisions.

Motivation

The vector store infrastructure and similarity search are already built. Before locking in a RAG configuration for production, we should systematically evaluate our options and have data to back our decisions. This ticket is a research spike — the outputs directly inform the RAG tagging service prompt and configuration.

Requirements

Acceptance Criteria

Sub-task 1: Document Sourcing & Seeding

  • Identify and collect candidate source documents for our usecase
  • Store raw documents in backend/data/raw/
  • Seed knowledge_chunks table via backend/scripts/seed_knowledge.py (I will hopefully have an endpoint soon enough to directly add and process pdfs so dont worry about this tm)

Sub-task 2: Chunking Strategy Experimentation (For this, before starting, you will have to come up with a good prompt to benchmark on for getting food tags or ingredients (I'd recommend the tags) this is a lot so ask questions!)

  • Research and implement at least 3 different chunking strategies — document tradeoffs of each before implementing
  • Re-seed vector store for each strategy
  • Run fixed set of food queries against each and record retrieval results
  • Plot precision@k and MRR across strategies
  • Document which strategy performs best for autoimmune-specific queries

Sub-task 3: Embedding Model Comparison

  • Research and select at least 3 embedding models to benchmark — document why each was chosen
  • Embed same knowledge base chunks with each model
  • Run same fixed query set and score top-k results against hand-labeled ground truth
  • Produce comparison table and plot of scores per model

Sub-task 4: Embedding Space Exploration (Maybe fade/skip if its too much but some real cool work here ngl!)

  • Extract all chunk embeddings from pgvector
  • Reduce to 2D using UMAP and t-SNE (Other dimensionality reduction techs may work too!)
  • Visualize and label clusters by trigger category
  • Analyze whether autoimmune trigger categories naturally separate in embedding space
  • Document findings — clean clusters = good model fit, messy = retrieval likely struggling

Sub-task 5: Findings Write-up & Slide Deck (MUST HAVE!)

  • Short slide deck summarizing: sources chosen, chunking comparison, embedding model comparison, cluster visualizations, and final recommendations
  • Recommendations feed directly into RAG tagging service configuration

Out of Scope

  • RAG tagging service implementation — separate ticket
  • Frontend

Technical Approach

Affected Areas

  • backend/scripts/seed_knowledge.py

  • backend/data/raw/ (new)

  • backend/scripts/evaluate_retrieval.py (new) — runs query eval harness

  • backend/notebooks/ (new) — Jupyter notebooks for plots and visualizations

Dependencies

Dependencies

Testing Notes

PLEASE TEST AND SHOW IT WORKS we don't have much time so testing so cruicial to prove it works and prevents lingering bugs

Metadata

Metadata

Labels

featureIntroduces a new and complete feature

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions