Skip to content

Commit 9804dd6

Browse files
committed
blog: add kagent and HAMi GPU virtualization article (EN + ZH)
Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
1 parent ed362ac commit 9804dd6

4 files changed

Lines changed: 620 additions & 0 deletions

File tree

blog/authors.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,8 @@ elrond_wang:
55

66
hami_community:
77
name: HAMi Community
8+
9+
mesut_oezdil:
10+
name: Mesut Oezdil
11+
title: Author
12+
url: https://www.linkedin.com/in/mesut-oezdil/
Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
---
2+
title: "Validating AI Agent-Driven GPU Management on Kubernetes with HAMi and kagent"
3+
date: "2026-05-28"
4+
description: "A real-world test of kagent and HAMi: one physical GPU virtualized into 10 vGPUs, an AI Agent managing Kubernetes workloads via CRDs, and Agent-to-Agent collaboration - all running on open-source models."
5+
authors: [mesut_oezdil]
6+
tags: ["HAMi", "kagent", "GPU Virtualization", "AI Agent", "Kubernetes", "vGPU", "Cloud Native"]
7+
---
8+
9+
Source: [mesutoezdil.substack.com](https://mesutoezdil.substack.com/p/kagent-hami-on-nebius-2-cncf-projects)
10+
GitHub Repo: [kagentWithHami](https://github.com/mesutoezdil/kagentWithHami)
11+
Chinese translation by Jimmy Song, originally published on [WeChat](https://mp.weixin.qq.com/s/WNzZh02_1CbMbVBfi4eRGw)
12+
13+
---
14+
15+
One physical NVIDIA L40S virtualized into 10 vGPUs with HAMi. An AI Agent deployed as a Kubernetes CRD via kagent. Agent-to-Agent delegation, GPU pod creation, overcommit protection - all driven by Llama 3.3 70B with no closed-source dependencies.
16+
17+
<!-- truncate -->
18+
19+
## Before We Start
20+
21+
This is not a documentation summary.
22+
23+
Every command you see below was executed by me personally on a Nebius VM. Every output is from that machine.
24+
25+
When things failed, I debugged them. When things worked, I explain why they worked. The errors in this article are real errors; the fixes are fixes I verified myself.
26+
27+
If you run these commands in the same environment, you will get the same results.
28+
29+
Complete repository (all manifests and setup script):
30+
31+
https://github.com/mesutoezdil/kagentWithHami
32+
33+
Scope note: this article covers the core parts. The full installation flow, all manifests, complete troubleshooting guide, and setup script are in the GitHub repository. If you want to reproduce this, start there.
34+
35+
If you haven't worked with HAMi before:
36+
37+
https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388
38+
39+
If you want to see GPU observability tooling tests:
40+
41+
https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero
42+
43+
## What This Article Is Actually About
44+
45+
kagent turns AI Agents into Kubernetes resources.
46+
47+
Your system prompt, tools, and model config all exist as CRDs.
48+
49+
You can:
50+
51+
- Version-control them with Git
52+
- Deploy them with Helm
53+
- Inspect them with kubectl
54+
55+
HAMi implements GPU virtualization at the Kubernetes scheduler layer.
56+
57+
One physical NVIDIA L40S becomes 10 virtual GPUs in Kubernetes, with strict VRAM limits enforced at the CUDA Driver level.
58+
59+
Nebius Token Factory is an OpenAI-compatible inference service.
60+
61+
All tests in this article use Llama 3.3 70B.
62+
63+
The question I wanted to answer:
64+
65+
> "Can an AI Agent, running inside a Kubernetes cluster, using only open-source models, manage GPU-virtualized workloads?"
66+
67+
The answer is yes.
68+
69+
## Test Machine
70+
71+
```
72+
GPU: 1x NVIDIA L40S (46GB VRAM)
73+
CPU: 8 vCPUs
74+
RAM: 32GB
75+
OS: Ubuntu 24.04 LTS for NVIDIA GPUs (CUDA 13)
76+
```
77+
78+
```
79+
nvidia-smi
80+
| NVIDIA-SMI 580.126.09 CUDA Version: 13.0 |
81+
| 0 NVIDIA L40S 0MiB / 46068MiB 0% |
82+
```
83+
84+
46GB VRAM. Completely idle.
85+
86+
By the end of this article, it becomes 10 virtual GPUs.
87+
88+
## 1. Install k3s and Helm
89+
90+
k3s is the right choice for a single-node environment.
91+
92+
```bash
93+
curl -sfL https://get.k3s.io | sh -
94+
```
95+
96+
(Subsequent commands follow as in the repository; full walkthrough in the GitHub repo.)
97+
98+
## 2. Install kagent
99+
100+
kagent ships two Helm charts.
101+
102+
Install the CRDs first, then the main chart.
103+
104+
This lets you upgrade CRDs independently without affecting running Agents.
105+
106+
```bash
107+
helm install kagent-crds \
108+
oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
109+
--namespace kagent
110+
```
111+
112+
Then install the main chart, wired to the Nebius Token Factory endpoint.
113+
114+
## 3. Install HAMi
115+
116+
Without HAMi, Kubernetes sees no GPU at all:
117+
118+
```json
119+
{"cpu": "8", "memory": "32865164Ki", "pods": "110"}
120+
```
121+
122+
No `nvidia.com/gpu`.
123+
124+
After installing HAMi:
125+
126+
```json
127+
{
128+
"cpu": "8",
129+
"memory": "32865164Ki",
130+
"nvidia.com/gpu": "10",
131+
"pods": "110"
132+
}
133+
```
134+
135+
One physical GPU, virtualized into 10.
136+
137+
## 4. First Agent Call
138+
139+
The LLM automatically:
140+
141+
- Calls the Kubernetes API
142+
- Fetches resources
143+
- Summarizes the result
144+
145+
Final output:
146+
147+
> "The cluster has 25 running pods across different namespaces, including kagent and kube-system."
148+
149+
## 5. GPU Check
150+
151+
Before HAMi:
152+
153+
> "The node does not have any GPUs available."
154+
155+
After HAMi:
156+
157+
> "The node nebius-tarantula has 10 GPUs available, type NVIDIA L40S."
158+
159+
The Agent reads and understands HAMi's Kubernetes annotations.
160+
161+
## 6. Self-Inspection Test
162+
163+
The Agent describes itself using the Kubernetes API.
164+
165+
It:
166+
167+
- Finds its own CRD
168+
- Reads its own system prompt
169+
- Reads its own tool list
170+
- Explains its own architecture
171+
172+
An Agent reading and explaining its own definition via live API calls.
173+
174+
## 7. Create a Custom Agent
175+
176+
Created an SRE orchestrator that delegates metrics queries to a `promql-agent`.
177+
178+
The key mechanism:
179+
180+
```yaml
181+
type: Agent
182+
```
183+
184+
This enables Agent-to-Agent (A2A) delegation.
185+
186+
## 8. Agent Talks to Agent
187+
188+
Two separate Agents with:
189+
190+
- Independent sessions
191+
- Independent context windows
192+
- Independent PostgreSQL storage
193+
194+
The orchestrator sees only the sub-agent's final result, not its internal reasoning.
195+
196+
## 9. Agent Creates a HAMi GPU Pod
197+
198+
The Agent automatically creates a pod with:
199+
200+
```yaml
201+
annotations:
202+
nvidia.com/gpumem: "20000"
203+
```
204+
205+
Then:
206+
207+
- First pod allocated 20,000 MiB
208+
- Second pod allocated 15,000 MiB
209+
210+
Both pods co-scheduled to the same physical GPU.
211+
212+
HAMi handles GPU sharing correctly.
213+
214+
## 10. Overcommit Protection
215+
216+
When requesting:
217+
218+
```yaml
219+
nvidia.com/gpu: 11
220+
```
221+
222+
but the cluster only has 10 virtual GPUs:
223+
224+
```
225+
Warning FailedScheduling hami-scheduler
226+
```
227+
228+
The pod stays Pending.
229+
230+
HAMi does not schedule requests it cannot satisfy.
231+
232+
## 11. HAMi Metrics
233+
234+
HAMi exposes standard Prometheus metrics:
235+
236+
- `HostCoreUtilization`
237+
- `HostGPUMemoryUsage`
238+
- `hami_build_info`
239+
240+
Plugs directly into existing monitoring stacks.
241+
242+
## 12. kagent CLI
243+
244+
The kagent CLI shows:
245+
246+
- Agents
247+
- Sessions
248+
- A2A sub-sessions
249+
- Delegation latency
250+
251+
All state stored in PostgreSQL.
252+
253+
**A2A Agent Card**
254+
255+
Every Agent exposes:
256+
257+
```
258+
/.well-known/agent-card.json
259+
```
260+
261+
Used for capability discovery in multi-agent systems.
262+
263+
## What Did Not Work
264+
265+
**Memory CRD** - only Pinecone is supported right now.
266+
267+
**kmcp init** - not available in v0.8.6.
268+
269+
**Ubuntu + HAMi + sleep** - if the image is missing CUDA libraries, even a `sleep` container fails to start.
270+
271+
**HAMi WebUI** - requires a separate installation step.
272+
273+
## Why This Combination Makes Sense
274+
275+
Your deployment specs live in Git.
276+
277+
Your network policies live in Git.
278+
279+
Your RBAC rules live in Git.
280+
281+
Why shouldn't your AI Agent's system prompts?
282+
283+
kagent makes that possible.
284+
285+
HAMi solves GPU resource waste without modifying workloads.
286+
287+
Together:
288+
289+
An AI Agent can observe, understand, and manage GPU-virtualized infrastructure from inside a Kubernetes cluster.
290+
291+
And it does this:
292+
293+
- Using open-source models
294+
- Without depending on closed-source AI providers
295+
- Running entirely inside Kubernetes
296+
297+
---
298+
299+
## Summary
300+
301+
One NVIDIA L40S, split into 10 virtual GPUs by HAMi. An AI Agent deployed as a Kubernetes CRD via kagent. A2A delegation across independent sessions. All running on an open-source model with no closed-source dependencies.
302+
303+
The combination works end to end: the Agent reads HAMi annotations, schedules GPU pods, detects overcommit, and queries Prometheus metrics - entirely from inside the cluster.
304+
305+
Full manifests and setup script: [github.com/mesutoezdil/kagentWithHami](https://github.com/mesutoezdil/kagentWithHami)

i18n/zh/docusaurus-plugin-content-blog/authors.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,8 @@ elrond_wang:
55

66
hami_community:
77
name: HAMi 社区
8+
9+
mesut_oezdil:
10+
name: Mesut Oezdil
11+
title: Author
12+
url: https://www.linkedin.com/in/mesut-oezdil/

0 commit comments

Comments
 (0)