Run agents locally using Ollama and Llama Stack for model serving, or connect to any OpenAI-compatible API.
Windows users: The Makefiles require a bash-compatible shell. Use WSL, Git Bash, or a similar environment.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | shFor other platforms, see the Ollama docs.
Start the Ollama service (keep this terminal open):
ollama serveollama pull llama3.2:3bFor RAG agents that need embeddings:
ollama pull embeddinggemma:latestFrom your agent directory (e.g., agents/llamaindex/websearch_agent):
# Install llama-stack and its provider dependencies (ollama, milvus)
uv tool install llama-stack \
--with ollama \
--with "pymilvus>=2.4.10" \
--with "milvus-lite>=2.5.1" \
--with chardet \
--with pypdf \
--with "setuptools<82"
# Create milvus data directory (required by run_llama_server.yaml)
mkdir -p ../../../milvus_data
# Start the server
llama stack run ../../../run_llama_server.yamlThe server starts on http://localhost:8321.
cd agents/langgraph/react_agent # or any other agent
make init # creates .envEdit .env:
API_KEY=dummy
BASE_URL=http://localhost:8321/v1
MODEL_ID=llama3.2:3bmake runThe agent starts on http://localhost:8000.
If you have an OpenAI-compatible API endpoint (OpenAI, Azure OpenAI, vLLM, etc.), just point BASE_URL and API_KEY at it:
API_KEY=sk-...
BASE_URL=https://api.openai.com/v1
MODEL_ID=gpt-4o# Health check
curl http://localhost:8000/health
# Non-streaming
curl -X POST http://localhost:8000/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}], "stream": false}'
# Streaming
curl -sN -X POST http://localhost:8000/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}], "stream": true}'cd agents/langgraph/react_agent
make testAll agents use uv for dependency management:
curl -LsSf https://astral.sh/uv/install.sh | shTo install an agent's dependencies locally:
cd agents/langgraph/react_agent
uv pip install -e ".[dev]"