Merge pull request #314 from EleanorWho/ehu/server-config

leseb · web-flow · commit d6497ecf838e · 2026-03-04T16:27:33.000+01:00
Add comprehensive Ollama setup and demo documentation to README files
diff --git a/demos/01_foundations/README.md b/demos/01_foundations/README.md
@@ -1,5 +1,21 @@
 # Foundations
 
+## Prerequisites
+
+Before running these demos, ensure you have a Llama Stack server running:
+
+**Option 1: Local Server (Recommended for learning)**
+```bash
+# Follow the setup instructions in ../README.md to install Ollama and start the server
+llama stack run starter  # Runs on localhost:8321
+```
+
+**Option 2: Remote Server**
+If using a remote server (e.g., OpenShift AI), ensure you have:
+- Network access to the server
+- Authentication token configured in `.env` file (`LLAMA_STACK_CLIENT_API_KEY`)
+- Port forwarding set up if needed
+
 ## Overview
 This folder teaches the fundamental building blocks of Llama Stack, including client setup, chat completions, vector databases, and tool integration. These examples cover the core APIs and concepts needed to build AI applications.
 
diff --git a/demos/README.md b/demos/README.md
@@ -7,6 +7,21 @@ This directory contains demo examples for getting started with Llama Stack.
 First, install [`uv`](https://docs.astral.sh/uv/getting-started/installation/), a fast Python package manager.
 
 ```bash
+# 0️⃣ Install Ollama (if using local inference)
+#    - Download and install from https://ollama.com/download
+#    - Or use your package manager (recommended for security):
+#      - macOS: brew install ollama
+#      - Linux: Follow instructions at https://ollama.com/download/linux
+#      - Windows: Download installer from https://ollama.com/download/windows
+
+#    - Pull a model (required for inference). Use smaller models for CPU-only systems:
+ollama pull llama3.2:1b    # 1B model - fast on CPU
+# OR
+ollama pull llama3.2:3b    # 3B model - default, slower on CPU
+
+#    - Verify Ollama is running (should return JSON with model list):
+curl http://localhost:11434/api/tags
+
 # 1️⃣ Create a virtual environment in the current directory (.venv)
 #    - Use Python 3.12 explicitly
 #    - --seed ensures pip and core packaging tools are installed in the venv
@@ -22,22 +37,58 @@ source .venv/bin/activate
 #    - This installs the CLI (`llama`) and required core dependencies
 uv pip install -U llama_stack
 
-# 3.5️⃣ Install or upgrade the llama-stack-client SDK
+# 4️⃣ Install or upgrade the llama-stack-client SDK
 #    - This is the Python client library for interacting with a Llama Stack server
 #    - Provides high-level APIs for inference, agents, safety, and more
 uv pip install -U llama-stack-client
 
-# 4️⃣ Install additional dependencies required by the "starter" demo profile
+# 5️⃣ Install additional dependencies required by the "starter" demo profile
 #    - `llama stack list-deps starter` prints required packages (one per line)
 #    - `xargs -L1 pip install` installs each dependency line-by-line
 #    - Assumes the virtual environment is active
 llama stack list-deps starter | xargs -L1 uv pip install
 
-# 5️⃣ Run the "starter" demo using a local Ollama server
-#    - OLLAMA_URL sets the endpoint for the Ollama model server
-#    - This environment variable applies only to this command
-#    - The starter demo connects to Ollama at localhost:11434
+# 6️⃣ Run the "starter" Llama Stack server
+#    - This starts a LOCAL server on port 8321 (default for starter distribution)
+#    - The server connects to Ollama at localhost:11434 for inference
+#    - IMPORTANT: Keep this terminal open - the server runs in foreground
+#    - The server must stay running for demos to work
 OLLAMA_URL=http://localhost:11434/v1 uv run llama stack run starter
+
+# 7️⃣ Verify the server is running (in a NEW terminal - server must be running!)
+#    - Open a SECOND terminal window
+#    - Navigate to the repository directory and activate the virtual environment
+cd <repo-root>  # Navigate to where you cloned the repo
+source .venv/bin/activate
+
+# 8️⃣ Test the connection
+#    - Run the client setup demo to verify server is running
+python -m demos.01_foundations.01_client_setup localhost 8321  # Note: port 8321 for local starter server
+```
+
+### Troubleshooting
+
+**Port already in use (8321):**
+```bash
+# Find and kill the process using port 8321
+lsof -i :8321
+kill <PID>
+```
+
+**Server not starting:**
+```bash
+# Check if Ollama is running
+curl http://localhost:11434/api/tags
+
+# Check if a model is pulled
+ollama list
+```
+
+**Version compatibility errors:**
+```bash
+# Reinstall all packages with matching versions
+pip uninstall -y llama-stack llama-stack-api llama-stack-client
+uv pip install -U llama-stack llama-stack-client
 ```
 
 ## Available Demos