Skip to content

[FEAT] LLM Rag pipeline for Recommendation #52

@Jrodrigo06

Description

@Jrodrigo06

name: LLM Rag pipeline for Recommendation
title: "[FEAT] - Rag Recommendation Pipeline"
labels: feature, backlog
assignees: @mdeekshita @SHarg9876 @keshavgoel787

Summary

LLM-powered RAG pipeline that auto-suggests ingredients and trigger bucket tags

Motivation

Likely that users are unaware of buckets and potential ingredient triggers, so suggestions can be a really powerful feature!

Requirements

Acceptance Criteria

Sub-task 1: Tag & Bucket Models

  • Tag model created with fields: id (UUID), name (String), is_system (bool — True if LLM-suggested, False if user-defined), created_at
  • FoodLogTag join table created linking food_logs.idtags.id
  • System tags seeded for core trigger buckets: gluten, FODMAPs, nightshades, histamines, added sugar, artificial additives, dairy, FDA Big 9 allergens
  • TagCreate and TagResponse schemas created
  • SuggestedIngredientResponse schema created with fields: name (String), buckets (list of String)
  • FoodLogResponse updated to include suggested_tags and suggested_ingredients: list[SuggestedIngredientResponse] fields

Sub-task 2: Tag Endpoints

  • POST /food-log/{food_log_id}/tags — confirms and persists tags to a food log
  • POST /tags — creates a custom user-defined tag (is_system=False)
  • GET /tags — returns all available tags for a user to pick from
  • Unconfirmed tags must not be persisted until user explicitly confirms

Sub-task 3: Vector Store & RAG Infrastructure

  • pgvector extension enabled on Supabase
  • KnowledgeChunk model created with fields: id (UUID), content (Text), embedding (Vector), source (String)
  • Embedding generation utility using Gemini models/text-embedding-004
  • Similarity search function that takes a food name query, returns top N relevant chunks from the knowledge store
  • Seeding script scaffolded with TODOs for document sources (Monash FODMAP, FDA Big 9, AIP framework) — actual content to be added once sources are confirmed

Sub-task 4: RAG Tagging Service

  • RAGTaggingService created that follows the same constructor pattern as FoodService
  • Given a food name and its ingredients JSONB field, runs similarity search to retrieve relevant trigger context chunks
  • LLM prompt built with: food name, ingredients, retrieved context chunks — instructs LLM to suggest likely ingredients and classify each into trigger buckets without hallucinating unlikely ingredients
  • Returns structured response: suggested_ingredients: list[SuggestedIngredientResponse]
  • Hooked into food log creation flow — suggestions returned in response, not auto-saved

Out of Scope

  • Frontend confirmation UI — backend only
  • Actual seeding of knowledge documents — scaffolded only, content TBD
  • Apriori analysis — separate ticket
  • Barcode scan or external food API integration

Technical Approach

Affected Areas

  • backend/models/tag.py (new) — Tag and FoodLogTag models
  • backend/models/knowledge_chunk.py (new) — vector store model
  • backend/models/food_log.py — add relationship to FoodLogTag
  • backend/schemas/tag.py (new) — TagCreate, TagResponse, SuggestedIngredientResponse
  • backend/schemas/food.py — add suggested_tags and suggested_ingredients to FoodLogResponse
  • backend/services/tagging_service.py (new) — tag business logic
  • backend/services/rag_tagging_service.py (new) — RAG + LLM suggestion logic
  • backend/repositories/food_repository.py — add tag persistence methods
  • backend/repositories/tag_repository.py (new)
  • backend/routers/food_router.py — update create endpoint
  • backend/routers/tag_router.py (new) — register in main.py

Dependencies

  • Food, FoodLog models exist
  • FoodService, FoodRepository patterns exist
  • pgvector must be enabled on Supabase before sub-task 3 begins (I might have to update the docker image for the test db for this I'll look into this though)

Testing Notes

Follow existing integration test patterns in backend/tests/integration/

  • LLM suggestion — given a Food with populated ingredients JSONB, correct trigger bucket tags and suggested ingredients returned
  • Sparse ingredientsFood with ingredients=None should return reasonable suggestions from food name alone without hallucinating
  • RAG retrieval — similarity search should return relevant chunks for a given food query
  • Tag confirmation — unconfirmed tags must not appear in DB after food log creation
  • Suggested ingredients — must never be persisted, only returned in response
  • Custom tag creationis_system=False tag persists and appears in GET /tags
  • Duplicate tags — saving same tag to a food log twice should not create duplicate join table entries

Additional Context

(https://www.youtube.com/watch?v=T-D1OfcDW1M)

Metadata

Metadata

Labels

featureIntroduces a new and complete feature

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions