You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add GPU support, hosted model flow, MLflow cluster injection, and cleanup scripts
- Ollama deployment now requests nvidia.com/gpu for cluster inference
- Route timeout set to 300s for multi-tool agent queries
- deploy-local.sh and deploy-cluster.sh auto-detect Ollama vs hosted model from BASE_URL
- setup-cluster.sh prompts for local Ollama or hosted model (OpenAI, etc.)
- deploy-cluster.sh injects MLflow env vars into agent pod via deployment.yaml
- deploy-cluster.sh refreshes MLflow token from oc, checks prereqs (docker, envsubst)
- deploy-cluster.sh and setup-cluster.sh prompt for project with option to create new
- Add cleanup-local.sh for stopping LlamaStack and cleaning up
- agent.py: MLflow setup wrapped in try/except for graceful failure
- agent.py: load_dotenv() at module level so MLflow reads MLFLOW_TRACKING_TOKEN
- README rewritten to show .env as single config driving all scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|`http://localhost:8321`| Scripts deploy Ollama + LlamaStack locally, or Ollama on the cluster |
53
+
|`https://api.openai.com/v1` (or any remote URL) | Scripts skip Ollama/LlamaStack entirely — agent connects directly to the hosted model |
54
+
55
+
> **Model note:**`qwen2.5:7b` is recommended for reliable function calling with Ollama. Smaller models like `llama3.2:3b` struggle with multi-tool orchestration.
24
56
25
57
---
26
58
@@ -35,8 +67,9 @@ From the demo directory, copy these files into `agents/base/langgraph_react_agen
35
67
|`.env`|`.env`| All secrets and config — share securely with your team |
36
68
|`deploy-local.sh`|`deploy-local.sh`| One-command local setup and run |
|`setup-cluster.sh`|`setup-cluster.sh`| Deploys Ollama on cluster |
70
+
|`setup-cluster.sh`|`setup-cluster.sh`| Deploys Ollama on cluster (or configures hosted model) |
39
71
|`cleanup-cluster.sh`|`cleanup-cluster.sh`| Removes all cluster resources |
72
+
|`cleanup-local.sh`|`cleanup-local.sh`| Stops LlamaStack, cleans up local |
40
73
|`k8s/ollama-deployment.yaml`|`k8s/ollama-deployment.yaml`| Ollama pod for the cluster |
41
74
|`k8s/ollama-service.yaml`|`k8s/ollama-service.yaml`| Ollama service |
42
75
@@ -48,49 +81,53 @@ mlflow>=2.19.0
48
81
49
82
And in `main.py`, change `recursion_limit` from `10` to `25`.
50
83
51
-
> **Model note:**`qwen2.5:7b` is recommended for reliable function calling. Smaller models like `llama3.2:3b` struggle with multi-tool orchestration, and `llama3.1:8b` does not produce structured tool calls through LlamaStack.
52
-
53
84
> **Import fix:** After copying `tools.py` and `agent.py`, replace all occurrences of `langgraph_outdoor_activity_agent` with `langgraph_react_agent_base` in the import lines.
54
85
55
86
---
56
87
57
88
## Run locally
58
89
59
-
### 1. Start Ollama
60
-
61
-
Ollama is a system-level application (not a Python package). It must be installed separately and runs outside the virtual environment.
90
+
### With local Ollama (default `.env`)
62
91
92
+
Start Ollama in one terminal:
63
93
```bash
64
94
ollama serve
65
95
```
66
96
67
-
Keep this running in its own terminal. Ollama needs to be running before the deploy script can pull models and start LlamaStack.
68
-
69
-
### 2. Run the deploy script
70
-
71
-
In a new terminal:
72
-
97
+
Run the agent in another terminal:
73
98
```bash
74
99
cd agents/base/langgraph_react_agent
75
100
chmod +x deploy-local.sh
76
101
./deploy-local.sh
77
102
```
78
103
79
-
This script will:
80
-
- Create a Python virtual environment and install dependencies
81
-
- Pull Ollama model (`qwen2.5:7b`)
82
-
- Start LlamaStack in the background
83
-
- Launch the interactive agent
104
+
The script detects `localhost` in `BASE_URL` and automatically:
105
+
- Pulls the model from `MODEL_ID`
106
+
- Starts LlamaStack
107
+
- Launches the interactive agent
84
108
85
-
Make sure `qwen2.5:7b` is registered in `run_llama_server.yaml` under `registered_resources.models`:
109
+
### With a hosted model (e.g. OpenAI)
86
110
87
-
```yaml
88
-
- model_id: qwen2.5:7b
89
-
provider_id: ollama
90
-
model_type: llm
91
-
metadata: { }
111
+
Update `.env`:
112
+
```
113
+
API_KEY=sk-your-openai-key
114
+
BASE_URL=https://api.openai.com/v1
115
+
MODEL_ID=gpt-4o-mini
116
+
```
117
+
118
+
Then run:
119
+
```bash
120
+
./deploy-local.sh
92
121
```
93
122
123
+
The script detects a remote `BASE_URL` and skips Ollama and LlamaStack — it just installs dependencies and runs the agent directly.
124
+
125
+
### To change the Ollama model
126
+
127
+
Update these three places:
128
+
-`MODEL_ID` in `.env` (e.g. `ollama/qwen2.5:7b`)
129
+
- The model entry in `run_llama_server.yaml` under `registered_resources.models`
130
+
94
131
### Try it out
95
132
96
133
```
@@ -99,13 +136,20 @@ Is it safe to go running outdoors in San Francisco tomorrow morning?
99
136
I want to go biking in Yosemite next weekend, any recommendations?
100
137
```
101
138
139
+
### Clean up
140
+
141
+
```bash
142
+
chmod +x cleanup-local.sh
143
+
./cleanup-local.sh
144
+
```
145
+
102
146
---
103
147
104
148
## Deploy to OpenShift cluster
105
149
106
150
### 1. Update `.env` for cluster
107
151
108
-
Set the `CONTAINER_IMAGE` to your registry. The `BASE_URL` and `MODEL_ID` will be auto-detected by the deploy script once Ollama is running on the cluster.
152
+
Set `CONTAINER_IMAGE` to the registry path where the deploy script will build and push the agent image:
0 commit comments