A workflow-based AI agent built with LlamaIndex that provides web search capabilities for research tasks and real-time information retrieval.
- Workflow-Based Architecture: Event-driven execution using LlamaIndex Workflows
- Web Search Tool: Extensible web search functionality
- FastAPI Service: Production-ready REST API with async support
- Dual Deployment: Local development with LlamaStack/Ollama or production on Red Hat OpenShift
- OpenAI-Compatible API: Works with any OpenAI-compatible endpoint
- Memory Management: Built-in chat memory buffer for context retention
- Python 3.11 or higher
- Ollama installed
- Git
- OpenShift CLI (
oc) installed and authenticated - Docker with buildx plugin (
docker buildx install) envsubstutility (for environment variable substitution)- Access to a container registry (Quay.io, Docker Hub, or GHCR)
- Container registry authentication (
docker login <registry>)
This section covers running the agent locally using LlamaStack server with Ollama for model inference.
git clone <repository-url>
cd Agentic-Starter-Kitspython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activateIf you want to install ollama you need to install app from Ollama site or via Brew
Install Ollama if not already installed:
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or visit https://ollama.com/downloadPull the required model:
ollama pull llama3.2:3bollama serveKeep this terminal open - Ollama needs to keep running.
From the repository root directory:
llama stack run run_llama_server.yamlKeep this terminal open - the server needs to keep running. You should see output indicating the server started on
http://localhost:8321.
The server will start on http://127.0.0.1:8321 with:
- Inference API (Ollama backend)
- Vector I/O API (Milvus Lite)
- Safety API (Llama Guard)
Configuration (run_llama_server.yaml):
- Port:
8321 - Ollama URL:
http://localhost:11434/v1 - Model:
llama3.2:3b
Leave this terminal running and open a new terminal for the next steps.
Install dependencies:
pip install --upgrade pip
pip install -r requirements.txtCreate a .env file in the agent directory:
# Local development configuration
MODEL_ID=ollama/llama3.2:3b
BASE_URL=http://127.0.0.1:8321/v1
API_KEY=not-needed
# Comment out or remove deployment variables for local use:
#CONTAINER_IMAGE=quay.io/your-username/llamaindex-websearch-agent:latestEnvironment Variables Explained:
MODEL_ID: Model identifier (format:ollama/<model-name>)BASE_URL: LlamaStack server endpoint (must end with/v1)API_KEY: Not required for local Ollama (not-neededis a placeholder)
For a terminal-based chat interface:
python examples/execute_ai_service_locally.pyIn a new terminal, test the agent:
Health Check:
curl http://localhost:8000/healthExpected response:
{
"status": "healthy",
"agent_initialized": true
}Send a Chat Request:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the capital of France?"}'⚡ Or with uv (from repo root):
- Create venv and activate:
uv venv --python 3.12
source .venv/bin/activate- Copy shared utils into the agent package:
cp utils.py agents/base/llamaindex_websearch_agent/src/llama_index_workflow_agent_base- Install agent (editable) and its requirements:
uv pip install -e agents/base/llamaindex_websearch_agent/. -r agents/base/llamaindex_websearch_agent/requirements.txt- Run the example:
uv run agents/base/llamaindex_websearch_agent/examples/execute_ai_service_locally.pyw# Deployment on RedHat OpenShift Cluster
Navigate to the agent directory:
cd agents/base/llamaindex_websearch_agentMake scripts executable (first time only)
chmod +x init.sh deploy.sh
./init.shThis will:
- Load and validate environment variables from
.envfile - Copy shared utilities (
utils.py) to the agent source directory
./deploy.shThis will:
- Create Kubernetes secret for API key
- Build and push the Docker image
- Deploy the agent to OpenShift
- Create Service and Route
Get your route URL:
oc get route llamaindex-websearch-agent -o jsonpath='{.spec.host}'copy the response to curl beneath to <YOUR_ROUTE_URL>
Send a test request:
curl -X POST https://<YOUR_ROUTE_URL>/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is LangChain?"}'Send a message to the agent and receive a structured response.
Request:
curl -X POST <AGENT_URL>/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Your question or instruction here"
}'Response:
{
"messages": [
{
"role": "user",
"content": "Your question"
},
{
"role": "assistant",
"content": "Response text",
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "tool_name",
"arguments": "{\"arg\":\"value\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_123",
"name": "tool_name",
"content": "Tool result"
}
],
"finish_reason": "stop"
}Check agent health and initialization status.
Request:
curl <AGENT_URL>/healthResponse:
{
"status": "healthy",
"agent_initialized": true
}The agent includes these example tools (see src/llama_index_workflow_agent_base/tools.py):
Simulates web search functionality (returns static results for demonstration).
Parameters:
query(str): Search query string
Example:
{
"name": "dummy_web_search",
"arguments": {
"query": "latest AI news"
}
}Returns:
["RedHat"] # Static result for demo purposesThe agent uses LlamaIndex Workflows for event-driven execution:
┌─────────────┐
│ Client │
└──────┬──────┘
│ POST /chat
▼
┌─────────────────────┐
│ FastAPI Server │
│ (main.py) │
└──────────┬──────────┘
│
▼
┌──────────────────────────┐
│ FunctionCallingAgent │
│ (LlamaIndex Workflow) │
└────────┬─────────────────┘
│
┌────┴─────────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ LLM │ │ Tools │
└─────────┘ └─────────┘
Workflow Steps:
- prepare_chat_history: Processes incoming messages and updates memory
- handle_llm_input: Sends chat history to LLM, gets response and tool calls
- handle_tool_calls: Executes tools and returns results
- Loop: Repeats until no more tool calls (StopEvent)
llamaindex_websearch_agent/
├── .env # Environment configuration (create this)
├── main.py # FastAPI application entry point
├── Dockerfile # Container image definition
├── requirements.txt # Python dependencies
├── init.sh # Initialization script
├── deploy.sh # OpenShift deployment automation
├── src/
│ └── llama_index_workflow_agent_base/
│ ├── __init__.py
│ ├── agent.py # Workflow closure and LLM client setup
│ ├── workflow.py # FunctionCallingAgent workflow definition
│ ├── tools.py # Tool implementations
│ └── utils.py # Shared utilities (copied by init.sh)
├── k8s/
│ ├── deployment.yaml # Kubernetes Deployment manifest
│ ├── service.yaml # Kubernetes Service manifest
│ └── route.yaml # OpenShift Route manifest
├── examples/
│ ├── execute_ai_service_locally.py # Interactive chat
│ ├── ai_service.py # AI service wrapper
│ └── _interactive_chat.py # Chat interface
└── tests/
└── test_tools.py # Unit tests
- Main Repository: Agentic-Starter-Kits README
- LlamaIndex Docs: https://docs.llamaindex.ai/
- LlamaIndex Workflows: https://docs.llamaindex.ai/en/stable/module_guides/workflow/
- LlamaStack Docs: https://llama-stack.readthedocs.io/
- Ollama Docs: https://docs.ollama.com/
- OpenShift Docs: https://docs.openshift.com/
- Implement Real Web Search: Replace
dummy_web_searchwith actual web search APIs (e.g., Brave Search, Tavily) - Add More Tools: Integrate calculator, database queries, or external APIs
- Enable Monitoring: Integrate with Prometheus/Grafana
- Add CI/CD: Automate deployments with GitHub Actions or GitLab CI
- Scale Horizontally: Increase replicas for high availability
- Implement Caching: Add Redis for conversation history persistence
- Streaming Responses: Enable streaming for real-time output