@@ -140,12 +140,6 @@ llama stack run ../../../run_llama_server.yaml
140140> ** Keep this terminal open** - the server needs to keep running.\
141141> You should see output indicating the server started on ` http://localhost:8321 ` .
142142
143- Create package with agent and install it to venv
144-
145- ``` bash
146- uv pip install -e .
147- ```
148-
149143### Load Documents into Vector Store
150144
151145** IMPORTANT** : Before running the agent, you must load documents into the vector store.
@@ -163,43 +157,6 @@ This will:
163157- Generate embeddings using the model specified in ` EMBEDDING_MODEL `
164158- Store chunks in the Milvus Lite vector database at ` VECTOR_STORE_PATH `
165159
166- ** Adding your own documents:**
167-
168- 1 . Create a text file with your content (e.g., ` my_documents.txt ` )
169- 2 . Update ` .env ` :
170- ``` env
171- DOCS_TO_LOAD=./data/my_documents.txt
172- ```
173- 3 . Re-run the document loader:
174- ``` bash
175- cd data
176- python load_documents.py
177- ```
178-
179- ** Customizing chunk size:**
180-
181- Edit ` load_documents.py ` to adjust chunking parameters:
182-
183- ``` python
184- load_and_index_documents(
185- chunk_size = 512 , # Size of text chunks (default: 512)
186- chunk_overlap = 128 , # Overlap between chunks (default: 128)
187- )
188- ```
189-
190- ** Recommended chunk sizes:**
191-
192- - Technical documentation: 512-1024 characters
193- - Narrative text: 256-512 characters
194- - Code snippets: 128-256 characters
195-
196- ** Troubleshooting vector store:**
197-
198- If you encounter issues with the vector store:
199-
200- 1 . Delete the contents of the ` milvus_data ` folder
201- 2 . Re-run ` python load_documents.py ` to recreate it
202-
203160### Run the example:
204161
205162``` bash
@@ -256,39 +213,6 @@ curl -X POST https://<YOUR_ROUTE_URL>/chat \
256213
257214## Agent-Specific Documentation
258215
259- ### Architecture
260-
261- The RAG workflow consists of three main steps:
262-
263- 1 . ** Agent Node** : Decides whether to retrieve information based on the user's query
264- 2 . ** Retrieve Node** : If needed, retrieves relevant documents from the vector store
265- 3 . ** Generate Node** : Generates a final answer based on retrieved context
266-
267- ```
268- START → Agent → [Decision] → Retrieve → Generate → END
269- ↓
270- END (if no retrieval needed)
271- ```
272-
273- ### Features
274-
275- - ** Agentic RAG Workflow** : The agent autonomously decides when to retrieve information
276- - ** Llama Stack Integration** : Unified model serving with Ollama for local LLM inference
277- - ** Milvus Lite Vector Store** : High-performance vector database with easy migration to production Milvus
278- - ** FastAPI Service** : REST API with ` /chat ` and ` /health ` endpoints
279- - ** Tool-based Retrieval** : LangGraph tool integration for seamless retrieval
280- - ** Document Loader** : Easy document ingestion from text files with customizable chunking
281-
282- ### Key Differences from Base Agents
283-
284- This RAG agent extends the base LangGraph agent with:
285-
286- 1 . ** Retrieval Capability** : Automatic knowledge base search via Llama Stack
287- 2 . ** Multi-step Workflow** : Agent → Retrieve → Generate pattern
288- 3 . ** Vector Store Integration** : Milvus Lite-based document storage and retrieval
289- 4 . ** Context-aware Generation** : Answers based on retrieved documents with relevance checking
290- 5 . ** Embedding Model Requirement** : Requires separate embedding model for document vectorization
291-
292216### Additional Resources
293217
294218- https://langchain-ai.github.io/langgraph/
0 commit comments