Skip to content

guide: pointing agents to model running on the cluster with vllm and kserve#51

Open
jehlum11 wants to merge 3 commits into
red-hat-data-services:mainfrom
jehlum11:main
Open

guide: pointing agents to model running on the cluster with vllm and kserve#51
jehlum11 wants to merge 3 commits into
red-hat-data-services:mainfrom
jehlum11:main

Conversation

@jehlum11
Copy link
Copy Markdown

@jehlum11 jehlum11 commented Apr 7, 2026

No description provided.

jehlum11 added 2 commits April 7, 2026 15:19
…-on-cluster-with-vllm-and-kserve

simple guide to cut across all agent templates
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

New documentation guide describing how to run a local agent while serving its model from vLLM on an OpenShift AI cluster via KServe, including ServingRuntime and InferenceService YAMLs, vLLM runtime args, multi-GPU/chat template notes, and an OpenShift Route exposure workaround.

Changes

Cohort / File(s) Summary
Documentation Guide
guide-local-agent-to-vllm-on-cluster.md
Added a new guide showing end-to-end setup to serve models from vLLM on OpenShift AI. Includes ServingRuntime YAML with vLLM container args (--enable-auto-tool-choice, --tool-call-parser, --max-model-len, --gpu-memory-utilization), optional multi-GPU and chat-template checks, InferenceService YAML selecting vLLM model format and per-model CPU/memory/GPU requests/limits, and notes on KServe RawDeployment exposing via headless Service plus ClusterIP + OpenShift Route workaround.

Sequence Diagram(s)

sequenceDiagram
    participant LocalAgent as Local Agent
    participant Route as OpenShift Route / ClusterIP
    participant KServe as KServe InferenceService
    participant vLLM as vLLM ServingRuntime Pod
    participant Storage as Model Storage (PVC/URI)

    LocalAgent->>Route: HTTP request to model endpoint
    Route->>KServe: Forward request to InferenceService
    KServe->>vLLM: Route inference request
    vLLM->>Storage: Mount/read model from storageUri
    vLLM-->>KServe: Return prediction/stream
    KServe-->>Route: Relay response
    Route-->>LocalAgent: Deliver response
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess relevance to the changeset. Add a brief description explaining the purpose and scope of the new guide document to help reviewers understand the contribution.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main addition: a guide for running agents locally with models served from a cluster using vLLM and KServe.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
guide-local-agent-to-vllm-on-cluster.md (2)

18-53: Consider adding language specifier to YAML code block.

For better syntax highlighting and linting support, add yaml as the language specifier.

📝 Proposed improvement
-```
+```yaml
 apiVersion: serving.kserve.io/v1alpha1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 18 - 53, The fenced
code block showing the ServingRuntime manifest lacks a language tag; update the
opening triple-backtick for that block to specify yaml (i.e., change ``` to
```yaml) so editors and linters will apply YAML highlighting and validation for
the ServingRuntime manifest, model args, ports, and supportedModelFormats
sections.

91-114: Consider adding language specifier to YAML code block.

For consistency with the first code block and better syntax highlighting, add yaml as the language specifier.

📝 Proposed improvement
-```
+```yaml
 apiVersion: serving.kserve.io/v1beta1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 91 - 114, Add a
language specifier to the fenced code block that defines the InferenceService so
the YAML (apiVersion: serving.kserve.io/v1beta1, kind: InferenceService,
spec.predictor.model.runtime: vllm-runtime, etc.) is highlighted consistently;
update the opening fence from ``` to ```yaml so the entire block (including
storageUri, resources, metadata.name: llama-3-3-70b) is parsed and rendered as
YAML.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Around line 116-118: The "Expose the Model Externally" section is incomplete
and must include concrete steps and examples to create a ClusterIP Service that
targets the vllm RawDeployment pods (replacing the headless Service) and to
create an OpenShift Route that points to that ClusterIP Service; update the
section to (1) explain how to create a ClusterIP Service (kubectl expose or a
Service YAML referencing the vllm RawDeployment selector and port names) and
include a sample Service manifest or kubectl command, and (2) show how to create
an OpenShift Route (oc create route or a Route YAML) that targets the new
Service with correct service name and port, TLS/hostname examples, and any
necessary annotations for KServe; reference the RawDeployment/Service selector
names and the Route/service names used in the diff so readers can plug them into
their manifests.
- Around line 120-122: The "3. Update app code to point to vllm + KServe on OAI"
section is incomplete—add three concrete subsections: (1) "Client configuration"
showing exact example values for endpoint URL, authentication (bearer/API key),
and required headers for an OpenAI-compatible vLLM+KServe endpoint; (2) "Code
examples" with short before/after snippets demonstrating how to switch an
OpenAI-compatible client (e.g., code that constructs a client, sets base_url,
headers, and sends a completion/request) from a local dev URL to the
cluster-served vLLM URL and how to enable TLS/auth; and (3) "Why langgraph"
explaining in 2–3 sentences why you migrated from Claude/Anthropic SDK to
langgraph/pure Python agents (compatibility with OpenAI-compatible endpoints,
lighter weight for custom deployment workflows, and easier integration with
KServe). Reference the section title "Update app code to point to vllm + KServe
on OAI" and include placeholder examples for URL/auth so readers can
copy-and-paste and adapt to their cluster.
- Around line 55-57: Fix the typo in the parser example: replace the stray
backtick at the end of 'openai\`' with a closing single quote so the example
reads 'openai'; update the sentence that references the parser flag
(--tool-call-parser=llama3_json) and the model names (Mistral-Small-4-119B-2603,
openai/gpt-oss-120b) to ensure the quotes around 'mistral' and 'openai' are
proper single quotes.

---

Nitpick comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Around line 18-53: The fenced code block showing the ServingRuntime manifest
lacks a language tag; update the opening triple-backtick for that block to
specify yaml (i.e., change ``` to ```yaml) so editors and linters will apply
YAML highlighting and validation for the ServingRuntime manifest, model args,
ports, and supportedModelFormats sections.
- Around line 91-114: Add a language specifier to the fenced code block that
defines the InferenceService so the YAML (apiVersion: serving.kserve.io/v1beta1,
kind: InferenceService, spec.predictor.model.runtime: vllm-runtime, etc.) is
highlighted consistently; update the opening fence from ``` to ```yaml so the
entire block (including storageUri, resources, metadata.name: llama-3-3-70b) is
parsed and rendered as YAML.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab2e3a6f-c9a3-4181-91f1-e6bca58128ef

📥 Commits

Reviewing files that changed from the base of the PR and between 292dd53 and 2b950aa.

📒 Files selected for processing (1)
  • guide-local-agent-to-vllm-on-cluster.md

Comment on lines +55 to +57
####

Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b’ will expect ‘openai\`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in parser example.

Line 57 has a typo: `'openai`` should end with a closing single quote instead of a backtick.

✍️ Proposed fix
-Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b' will expect 'openai\`.
+Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b' will expect 'openai'.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
####
Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b will expect openai\`.
####
Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b' will expect 'openai'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 55 - 57, Fix the typo
in the parser example: replace the stray backtick at the end of 'openai\`' with
a closing single quote so the example reads 'openai'; update the sentence that
references the parser flag (--tool-call-parser=llama3_json) and the model names
(Mistral-Small-4-119B-2603, openai/gpt-oss-120b) to ensure the quotes around
'mistral' and 'openai' are proper single quotes.

Comment on lines +116 to +118
## 2\. Expose the Model Externally

When deploying vllm with KServe using RawDeployment, it creates a **headless Service** (clusterIP: None). To expose the model externally, I needed to expose an OpenShift Route. But, OpenShift Routes cannot point to headless Services, so I needed a workaround to create a ClusterIP service. Using the product dashboard will let you do this too.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Complete the "Expose the Model Externally" section.

This section mentions a workaround but provides no implementation details. Users cannot complete the workflow without concrete steps to:

  1. Create the ClusterIP service
  2. Create and configure the OpenShift Route

Please add the YAML examples or CLI commands needed to expose the model externally.

Would you like me to help draft the missing content based on standard KServe/OpenShift patterns?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 116 - 118, The "Expose
the Model Externally" section is incomplete and must include concrete steps and
examples to create a ClusterIP Service that targets the vllm RawDeployment pods
(replacing the headless Service) and to create an OpenShift Route that points to
that ClusterIP Service; update the section to (1) explain how to create a
ClusterIP Service (kubectl expose or a Service YAML referencing the vllm
RawDeployment selector and port names) and include a sample Service manifest or
kubectl command, and (2) show how to create an OpenShift Route (oc create route
or a Route YAML) that targets the new Service with correct service name and
port, TLS/hostname examples, and any necessary annotations for KServe; reference
the RawDeployment/Service selector names and the Route/service names used in the
diff so readers can plug them into their manifests.

Comment thread guide-local-agent-to-vllm-on-cluster.md Outdated
Comment on lines +120 to +122
3. Update app code to point to vllm \+ KServe on OAI

This was one of the bigger changes that I’ve captured here \- initially using Claude & Anthropic’s Agent SDK and changed it to langgraph/pure python agents for this exercise.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Complete the "Update app code" section with concrete examples.

This section is incomplete—it mentions "bigger changes" that were "captured here" but provides no actual content. To fulfill the guide's promise of an end-to-end workflow, please add:

  1. Client configuration examples showing how to point the agent to the vLLM + KServe endpoint (URL, authentication, headers)
  2. Code snippets demonstrating the transition from local to cluster-served models
  3. Explanation of why you switched from Claude/Anthropic SDK to langgraph, and how it relates to this deployment pattern

Without this section, users cannot complete the workflow described in the guide's title.

Would you like me to help draft example code showing how to configure an OpenAI-compatible client to point to the vLLM endpoint on OpenShift AI?

🧰 Tools
🪛 LanguageTool

[style] ~122-~122: The word ‘bigger’ tends to be overused. Consider an alternative.
Context: ...m + KServe on OAI This was one of the bigger changes that I’ve captured here - initially us...

(BIG_DIFFERENCE)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 120 - 122, The "3.
Update app code to point to vllm + KServe on OAI" section is incomplete—add
three concrete subsections: (1) "Client configuration" showing exact example
values for endpoint URL, authentication (bearer/API key), and required headers
for an OpenAI-compatible vLLM+KServe endpoint; (2) "Code examples" with short
before/after snippets demonstrating how to switch an OpenAI-compatible client
(e.g., code that constructs a client, sets base_url, headers, and sends a
completion/request) from a local dev URL to the cluster-served vLLM URL and how
to enable TLS/auth; and (3) "Why langgraph" explaining in 2–3 sentences why you
migrated from Claude/Anthropic SDK to langgraph/pure Python agents
(compatibility with OpenAI-compatible endpoints, lighter weight for custom
deployment workflows, and easier integration with KServe). Reference the section
title "Update app code to point to vllm + KServe on OAI" and include placeholder
examples for URL/auth so readers can copy-and-paste and adapt to their cluster.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
guide-local-agent-to-vllm-on-cluster.md (2)

57-57: ⚠️ Potential issue | 🟡 Minor

Fix malformed quoting in parser example.

Line 57 still has mismatched smart quotes and a stray backtick in `'openai``; this can be copy-pasted incorrectly by readers.

Proposed fix
-Note: In this case, I also used ' --tool-call-parser=llama3_json' - each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b’ will expect ‘openai`.
+Note: In this case, I also used '--tool-call-parser=llama3_json' — each model uses a different parser. For example, Mistral-Small-4-119B-2603 expects 'mistral', and 'openai/gpt-oss-120b' expects 'openai'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` at line 57, Fix the malformed
quoting in the parser example: replace the mismatched smart quotes and the stray
backtick so the example consistently uses plain backticks and correct parser
names — e.g., show `--tool-call-parser=llama3_json`, then list parser names as
`mistral`, `openai/gpt-oss-120b`, and `openai` (remove the stray backtick after
openai and any smart quotes).

116-118: ⚠️ Potential issue | 🟠 Major

Add concrete Route workaround steps (Service + Route).

The section explains the problem but still lacks executable steps/manifests, so users cannot complete external exposure from this guide.

Proposed content to add
 ## 2. Expose the Model Externally
 
 When deploying vllm with KServe using RawDeployment, it creates a **headless Service** (clusterIP: None). To expose the model externally, I needed to expose an OpenShift Route. But, OpenShift Routes cannot point to headless Services, so I needed a workaround to create a ClusterIP service. Using the product dashboard will let you do this too.
+
+Create a ClusterIP Service targeting the same pods:
+
+```yaml
+apiVersion: v1
+kind: Service
+metadata:
+  name: llama-3-3-70b-clusterip
+spec:
+  type: ClusterIP
+  selector:
+    serving.kserve.io/inferenceservice: llama-3-3-70b
+  ports:
+    - name: http1
+      port: 80
+      targetPort: 8080
+```
+
+Then create a Route to that ClusterIP Service:
+
+```bash
+oc create route edge llama-3-3-70b-route \
+  --service=llama-3-3-70b-clusterip \
+  --port=http1
+```
+
+Get the external host:
+
+```bash
+oc get route llama-3-3-70b-route -o jsonpath='{.spec.host}{"\n"}'
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 116 - 118, Add
concrete, executable steps to the "2. Expose the Model Externally" section:
include a ClusterIP Service manifest (name: llama-3-3-70b-clusterip) with
selector serving.kserve.io/inferenceservice: llama-3-3-70b and a port mapping
(name http1, port 80 -> targetPort 8080), then show the oc create route edge
command to create an OpenShift Route (name: llama-3-3-70b-route) pointing to
that service with --port=http1, and finally include the oc get route ...
jsonpath command to print the external host; place these concrete manifest and
commands right after the explanation about headless Services so users can apply
them directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Around line 18-53: The markdown code fences for both YAML examples (the
ServingRuntime block containing "kind: ServingRuntime" and the InferenceService
block containing "kind: InferenceService") are missing language identifiers;
update each opening fence from ``` to ```yaml so the blocks are recognized as
YAML (apply the same change for the additional YAML block referenced later
around the InferenceService example).

---

Duplicate comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Line 57: Fix the malformed quoting in the parser example: replace the
mismatched smart quotes and the stray backtick so the example consistently uses
plain backticks and correct parser names — e.g., show
`--tool-call-parser=llama3_json`, then list parser names as `mistral`,
`openai/gpt-oss-120b`, and `openai` (remove the stray backtick after openai and
any smart quotes).
- Around line 116-118: Add concrete, executable steps to the "2. Expose the
Model Externally" section: include a ClusterIP Service manifest (name:
llama-3-3-70b-clusterip) with selector serving.kserve.io/inferenceservice:
llama-3-3-70b and a port mapping (name http1, port 80 -> targetPort 8080), then
show the oc create route edge command to create an OpenShift Route (name:
llama-3-3-70b-route) pointing to that service with --port=http1, and finally
include the oc get route ... jsonpath command to print the external host; place
these concrete manifest and commands right after the explanation about headless
Services so users can apply them directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 12316f44-2b0f-4ed1-be83-1d116686b39d

📥 Commits

Reviewing files that changed from the base of the PR and between 2b950aa and cdb32f7.

📒 Files selected for processing (1)
  • guide-local-agent-to-vllm-on-cluster.md

Comment on lines +18 to +53
```
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: vllm-runtime
spec:
containers:
- name: kserve-container
image: quay.io/modh/vllm #pin to version you need
args:
# --- Core (required) ---
- --port=8080 # KServe expects this port
- --model=/mnt/models # KServe mounts weights here
- --served-model-name={{.Name}} # matches InferenceService name

# --- Tool calling (required for agentic use cases) ---
- --enable-auto-tool-choice # enables tool call detection
- --tool-call-parser=llama3_json # model-specific

# --- Memory management (adjust per GPU) ---
- --max-model-len=16384 # caps context window to reduce KV cache VRAM
- --gpu-memory-utilization=0.9 # fraction of VRAM vLLM will use (default 0.9)

# --- Multi-GPU (if needed) ---
# - --tensor-parallel-size=4 # split model across N GPUs

# --- Optional ---
# - --chat-template=/path/to/template.jinja # only if model lacks built-in chat templates (see below)
# - --tool-parser-plugin=/path/to/plugin.py # for custom parsers (e.g., Nemotron)
ports:
- containerPort: 8080
protocol: TCP
supportedModelFormats:
- name: vLLM
autoSelect: true
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify fenced code block languages for lint compliance.

Both YAML blocks are missing fence languages (MD040), which will keep markdownlint warninging in CI.

Proposed fix
-```
+```yaml
 apiVersion: serving.kserve.io/v1alpha1
 kind: ServingRuntime
 ...
-```
+```

-```
+```yaml
 apiVersion: serving.kserve.io/v1beta1
 kind: InferenceService
 ...
-```
+```

Also applies to: 91-114

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 18 - 53, The markdown
code fences for both YAML examples (the ServingRuntime block containing "kind:
ServingRuntime" and the InferenceService block containing "kind:
InferenceService") are missing language identifiers; update each opening fence
from ``` to ```yaml so the blocks are recognized as YAML (apply the same change
for the additional YAML block referenced later around the InferenceService
example).

Copy link
Copy Markdown
Contributor

@mpk-droid mpk-droid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment. lmk what you think.

@@ -0,0 +1,118 @@
# Running an Agent Locally with a Model Served on vLLM on OpenShift AI
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this — the content itself is useful. However, I think this doc targets the platform engineer persona (creating ServingRuntime/InferenceService CRs, tuning vLLM memory, exposing Routes), whereas this repo has so far focused on the AI engineer persona.

From the AI engineer's perspective, they just need to point their agent at a LlamaStack URL to access the model — the infrastructure behind it is abstracted away.

Before adding platform-focused content to this repo, I think we'd need to establish a clear pattern for how we organize and scope docs across personas. Otherwise we risk mixing concerns and making the repo harder to navigate for our primary audience.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I largely agree. I would say though that the separation isn't as strict.
The platform engineer would deploy the operator (Kserve, llama-stack), then an end-user (an AIE etc) would need to still create instances against that operator (i.e. the CRs - Serving Runtime/inference serving, lls etc).
Wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks for the clarification. In my mind, the Platform engineer would also create the CRs for the operator.

Drawing on my personal experience, i feel like a clear line between AI eng and Platform Eng would be that Plat eng handles all things cluster and exposes URIs for various resources and the AI Eng uses those resources to carry out some actions. I feel like this creates a cleaner boundaries in their responsibilities. Lets imagine that one of the resources is crash looping, with the line drawn as above, its clear that platform engineer would resolve it. wdyt?

tarun-etikala added a commit to tarun-etikala/agentic-starter-kits that referenced this pull request Apr 16, 2026
…ed-hat-data-services#31, red-hat-data-services#51)

- Fail fast with a clear error when image.repository is empty instead
  of rendering an invalid ":latest" image reference
- Add checksum/secret annotation to pod template so pods auto-restart
  when secret values change (e.g. API_KEY rotation via make deploy)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants