|
| 1 | +# LLM Chat Flow in Flaskmarks |
| 2 | + |
| 3 | +This app uses a Retrieval-Augmented Generation (RAG) flow: |
| 4 | +1. Retrieve relevant bookmarks for the user. |
| 5 | +2. Send retrieved context plus question to Groq LLM. |
| 6 | +3. Return answer with source bookmarks. |
| 7 | + |
| 8 | +## Request Flow |
| 9 | + |
| 10 | +1. Frontend sends `POST /chat/send` with `query` and CSRF token. |
| 11 | + - `flaskmarks/templates/chat/index.html` |
| 12 | +2. Backend validates input via `ChatForm` (`3..1000` chars). |
| 13 | + - `flaskmarks/forms/chat.py` |
| 14 | +3. Route calls: |
| 15 | + - `rag_service.chat(query, user_id, chat_history)` |
| 16 | + - `flaskmarks/views/chat.py` |
| 17 | + |
| 18 | +## RAG + LLM Flow |
| 19 | + |
| 20 | +Inside `RAGService.chat`: |
| 21 | + |
| 22 | +1. Check feature/config: |
| 23 | + - `RAG_ENABLED` |
| 24 | + - `GROQ_API_KEY` |
| 25 | +2. Embed the user query with sentence-transformers. |
| 26 | +3. Run pgvector similarity search on `marks.embedding`, scoped by `owner_id` (user isolation). |
| 27 | +4. Build context from top matches: |
| 28 | + - title, URL, tags, description, content excerpt |
| 29 | +5. Call Groq chat completions with: |
| 30 | + - system prompt |
| 31 | + - recent chat history |
| 32 | + - current question + retrieved context |
| 33 | +6. Return: |
| 34 | + - `answer` |
| 35 | + - `sources` |
| 36 | + - `tokens_used` |
| 37 | + |
| 38 | +Relevant code: |
| 39 | +- `flaskmarks/core/rag/service.py` |
| 40 | +- `flaskmarks/core/rag/embeddings.py` |
| 41 | + |
| 42 | +## Embeddings and Storage |
| 43 | + |
| 44 | +Bookmark model stores vectors in: |
| 45 | +- `embedding` (`Vector(384)`) |
| 46 | +- `embedding_updated` |
| 47 | + |
| 48 | +Defined in: |
| 49 | +- `flaskmarks/models/mark.py` |
| 50 | + |
| 51 | +Database support is enabled by migration: |
| 52 | +- `migrations/versions/a1b2c3d4e5f6_add_embedding_column_for_rag.py` |
| 53 | + |
| 54 | +## How Embeddings Are Generated |
| 55 | + |
| 56 | +Embeddings are primarily generated via CLI: |
| 57 | +- `flask rag generate-embeddings` |
| 58 | + |
| 59 | +Code: |
| 60 | +- `flaskmarks/cli.py` |
| 61 | + |
| 62 | +If no relevant embedded bookmarks exist, chat returns a fallback message. |
| 63 | + |
| 64 | +## Session Chat History |
| 65 | + |
| 66 | +Chat history is stored in session as alternating user/assistant messages. |
| 67 | +History size is capped by `CHAT_MAX_HISTORY`. |
| 68 | + |
| 69 | +Code: |
| 70 | +- `flaskmarks/views/chat.py` |
| 71 | + |
| 72 | +## Main Config Knobs |
| 73 | + |
| 74 | +From `config.py`: |
| 75 | +- `RAG_ENABLED` |
| 76 | +- `GROQ_API_KEY` |
| 77 | +- `GROQ_MODEL` |
| 78 | +- `GROQ_TEMPERATURE` |
| 79 | +- `GROQ_MAX_TOKENS` |
| 80 | +- `RAG_TOP_K` |
| 81 | +- `RAG_SIMILARITY_THRESHOLD` |
| 82 | +- `CHAT_MAX_HISTORY` |
| 83 | + |
| 84 | +## Note |
| 85 | + |
| 86 | +`RAG_SIMILARITY_THRESHOLD` is defined in config but currently not applied in retrieval logic; retrieval uses top-K results. |
0 commit comments