guide: pointing agents to model running on the cluster with vllm and kserve by jehlum11 · Pull Request #51 · red-hat-data-services/agentic-starter-kits

jehlum11 · 2026-04-07T19:22:59Z

No description provided.

…-on-cluster-with-vllm-and-kserve simple guide to cut across all agent templates

coderabbitai · 2026-04-07T19:23:13Z

📝 Walkthrough

Walkthrough

New documentation guide describing how to run a local agent while serving its model from vLLM on an OpenShift AI cluster via KServe, including ServingRuntime and InferenceService YAMLs, vLLM runtime args, multi-GPU/chat template notes, and an OpenShift Route exposure workaround.

Changes

Cohort / File(s)	Summary
Documentation Guide `guide-local-agent-to-vllm-on-cluster.md`	Added a new guide showing end-to-end setup to serve models from vLLM on OpenShift AI. Includes `ServingRuntime` YAML with vLLM container args (`--enable-auto-tool-choice`, `--tool-call-parser`, `--max-model-len`, `--gpu-memory-utilization`), optional multi-GPU and chat-template checks, `InferenceService` YAML selecting `vLLM` model format and per-model CPU/memory/GPU requests/limits, and notes on KServe `RawDeployment` exposing via headless Service plus ClusterIP + OpenShift Route workaround.

Sequence Diagram(s)

sequenceDiagram
    participant LocalAgent as Local Agent
    participant Route as OpenShift Route / ClusterIP
    participant KServe as KServe InferenceService
    participant vLLM as vLLM ServingRuntime Pod
    participant Storage as Model Storage (PVC/URI)

    LocalAgent->>Route: HTTP request to model endpoint
    Route->>KServe: Forward request to InferenceService
    KServe->>vLLM: Route inference request
    vLLM->>Storage: Mount/read model from storageUri
    vLLM-->>KServe: Return prediction/stream
    KServe-->>Route: Relay response
    Route-->>LocalAgent: Deliver response

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess relevance to the changeset.	Add a brief description explaining the purpose and scope of the new guide document to help reviewers understand the contribution.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main addition: a guide for running agents locally with models served from a cluster using vLLM and KServe.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

guide-local-agent-to-vllm-on-cluster.md (2)

18-53: Consider adding language specifier to YAML code block.

For better syntax highlighting and linting support, add yaml as the language specifier.

📝 Proposed improvement

-```
+```yaml
 apiVersion: serving.kserve.io/v1alpha1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 18 - 53, The fenced
code block showing the ServingRuntime manifest lacks a language tag; update the
opening triple-backtick for that block to specify yaml (i.e., change ``` to
```yaml) so editors and linters will apply YAML highlighting and validation for
the ServingRuntime manifest, model args, ports, and supportedModelFormats
sections.

91-114: Consider adding language specifier to YAML code block.

For consistency with the first code block and better syntax highlighting, add yaml as the language specifier.

📝 Proposed improvement

-```
+```yaml
 apiVersion: serving.kserve.io/v1beta1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 91 - 114, Add a
language specifier to the fenced code block that defines the InferenceService so
the YAML (apiVersion: serving.kserve.io/v1beta1, kind: InferenceService,
spec.predictor.model.runtime: vllm-runtime, etc.) is highlighted consistently;
update the opening fence from ``` to ```yaml so the entire block (including
storageUri, resources, metadata.name: llama-3-3-70b) is parsed and rendered as
YAML.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Around line 116-118: The "Expose the Model Externally" section is incomplete
and must include concrete steps and examples to create a ClusterIP Service that
targets the vllm RawDeployment pods (replacing the headless Service) and to
create an OpenShift Route that points to that ClusterIP Service; update the
section to (1) explain how to create a ClusterIP Service (kubectl expose or a
Service YAML referencing the vllm RawDeployment selector and port names) and
include a sample Service manifest or kubectl command, and (2) show how to create
an OpenShift Route (oc create route or a Route YAML) that targets the new
Service with correct service name and port, TLS/hostname examples, and any
necessary annotations for KServe; reference the RawDeployment/Service selector
names and the Route/service names used in the diff so readers can plug them into
their manifests.
- Around line 120-122: The "3. Update app code to point to vllm + KServe on OAI"
section is incomplete—add three concrete subsections: (1) "Client configuration"
showing exact example values for endpoint URL, authentication (bearer/API key),
and required headers for an OpenAI-compatible vLLM+KServe endpoint; (2) "Code
examples" with short before/after snippets demonstrating how to switch an
OpenAI-compatible client (e.g., code that constructs a client, sets base_url,
headers, and sends a completion/request) from a local dev URL to the
cluster-served vLLM URL and how to enable TLS/auth; and (3) "Why langgraph"
explaining in 2–3 sentences why you migrated from Claude/Anthropic SDK to
langgraph/pure Python agents (compatibility with OpenAI-compatible endpoints,
lighter weight for custom deployment workflows, and easier integration with
KServe). Reference the section title "Update app code to point to vllm + KServe
on OAI" and include placeholder examples for URL/auth so readers can
copy-and-paste and adapt to their cluster.
- Around line 55-57: Fix the typo in the parser example: replace the stray
backtick at the end of 'openai\`' with a closing single quote so the example
reads 'openai'; update the sentence that references the parser flag
(--tool-call-parser=llama3_json) and the model names (Mistral-Small-4-119B-2603,
openai/gpt-oss-120b) to ensure the quotes around 'mistral' and 'openai' are
proper single quotes.

---

Nitpick comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Around line 18-53: The fenced code block showing the ServingRuntime manifest
lacks a language tag; update the opening triple-backtick for that block to
specify yaml (i.e., change ``` to ```yaml) so editors and linters will apply
YAML highlighting and validation for the ServingRuntime manifest, model args,
ports, and supportedModelFormats sections.
- Around line 91-114: Add a language specifier to the fenced code block that
defines the InferenceService so the YAML (apiVersion: serving.kserve.io/v1beta1,
kind: InferenceService, spec.predictor.model.runtime: vllm-runtime, etc.) is
highlighted consistently; update the opening fence from ``` to ```yaml so the
entire block (including storageUri, resources, metadata.name: llama-3-3-70b) is
parsed and rendered as YAML.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab2e3a6f-c9a3-4181-91f1-e6bca58128ef

📥 Commits

Reviewing files that changed from the base of the PR and between 292dd53 and 2b950aa.

📒 Files selected for processing (1)

guide-local-agent-to-vllm-on-cluster.md

coderabbitai · 2026-04-07T19:25:52Z

+#### 
+
+Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b’ will expect ‘openai\`.


⚠️ Potential issue | 🟡 Minor

Fix typo in parser example.

Line 57 has a typo: `'openai`` should end with a closing single quote instead of a backtick.

✍️ Proposed fix

-Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b' will expect 'openai\`. +Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b' will expect 'openai'.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

####

Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b’ will expect ‘openai\`.

####

Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b' will expect 'openai'.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@guide-local-agent-to-vllm-on-cluster.md` around lines 55 - 57, Fix the typo in the parser example: replace the stray backtick at the end of 'openai\`' with a closing single quote so the example reads 'openai'; update the sentence that references the parser flag (--tool-call-parser=llama3_json) and the model names (Mistral-Small-4-119B-2603, openai/gpt-oss-120b) to ensure the quotes around 'mistral' and 'openai' are proper single quotes.

coderabbitai · 2026-04-07T19:25:53Z

+## 2\. Expose the Model Externally
+
+When deploying vllm with KServe using RawDeployment, it creates a **headless Service** (clusterIP: None). To expose the model externally, I needed to expose an OpenShift Route. But, OpenShift Routes cannot point to headless Services, so I needed a workaround to create a ClusterIP service. Using the product dashboard will let you do this too.


⚠️ Potential issue | 🟠 Major

Complete the "Expose the Model Externally" section.

This section mentions a workaround but provides no implementation details. Users cannot complete the workflow without concrete steps to:

Create the ClusterIP service

Create and configure the OpenShift Route

Please add the YAML examples or CLI commands needed to expose the model externally.

Would you like me to help draft the missing content based on standard KServe/OpenShift patterns?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@guide-local-agent-to-vllm-on-cluster.md` around lines 116 - 118, The "Expose the Model Externally" section is incomplete and must include concrete steps and examples to create a ClusterIP Service that targets the vllm RawDeployment pods (replacing the headless Service) and to create an OpenShift Route that points to that ClusterIP Service; update the section to (1) explain how to create a ClusterIP Service (kubectl expose or a Service YAML referencing the vllm RawDeployment selector and port names) and include a sample Service manifest or kubectl command, and (2) show how to create an OpenShift Route (oc create route or a Route YAML) that targets the new Service with correct service name and port, TLS/hostname examples, and any necessary annotations for KServe; reference the RawDeployment/Service selector names and the Route/service names used in the diff so readers can plug them into their manifests.

coderabbitai · 2026-04-07T19:25:53Z

+3. Update app code to point to vllm \+ KServe on OAI
+
+This was one of the bigger changes that I’ve captured here \- initially using Claude & Anthropic’s Agent SDK and changed it to langgraph/pure python agents for this exercise.


⚠️ Potential issue | 🟠 Major

Complete the "Update app code" section with concrete examples.

This section is incomplete—it mentions "bigger changes" that were "captured here" but provides no actual content. To fulfill the guide's promise of an end-to-end workflow, please add:

Client configuration examples showing how to point the agent to the vLLM + KServe endpoint (URL, authentication, headers)

Code snippets demonstrating the transition from local to cluster-served models

Explanation of why you switched from Claude/Anthropic SDK to langgraph, and how it relates to this deployment pattern

Without this section, users cannot complete the workflow described in the guide's title.

Would you like me to help draft example code showing how to configure an OpenAI-compatible client to point to the vLLM endpoint on OpenShift AI?

🧰 Tools

🪛 LanguageTool

[style] ~122-~122: The word ‘bigger’ tends to be overused. Consider an alternative.
Context: ...m + KServe on OAI This was one of the bigger changes that I’ve captured here - initially us...

(BIG_DIFFERENCE)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@guide-local-agent-to-vllm-on-cluster.md` around lines 120 - 122, The "3. Update app code to point to vllm + KServe on OAI" section is incomplete—add three concrete subsections: (1) "Client configuration" showing exact example values for endpoint URL, authentication (bearer/API key), and required headers for an OpenAI-compatible vLLM+KServe endpoint; (2) "Code examples" with short before/after snippets demonstrating how to switch an OpenAI-compatible client (e.g., code that constructs a client, sets base_url, headers, and sends a completion/request) from a local dev URL to the cluster-served vLLM URL and how to enable TLS/auth; and (3) "Why langgraph" explaining in 2–3 sentences why you migrated from Claude/Anthropic SDK to langgraph/pure Python agents (compatibility with OpenAI-compatible endpoints, lighter weight for custom deployment workflows, and easier integration with KServe). Reference the section title "Update app code to point to vllm + KServe on OAI" and include placeholder examples for URL/auth so readers can copy-and-paste and adapt to their cluster.

Removed unnecessary details

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

guide-local-agent-to-vllm-on-cluster.md (2)

57-57: ⚠️ Potential issue | 🟡 Minor

Fix malformed quoting in parser example.

Line 57 still has mismatched smart quotes and a stray backtick in `'openai``; this can be copy-pasted incorrectly by readers.

Proposed fix

-Note: In this case, I also used ' --tool-call-parser=llama3_json' - each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b’ will expect ‘openai`.
+Note: In this case, I also used '--tool-call-parser=llama3_json' — each model uses a different parser. For example, Mistral-Small-4-119B-2603 expects 'mistral', and 'openai/gpt-oss-120b' expects 'openai'.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` at line 57, Fix the malformed
quoting in the parser example: replace the mismatched smart quotes and the stray
backtick so the example consistently uses plain backticks and correct parser
names — e.g., show `--tool-call-parser=llama3_json`, then list parser names as
`mistral`, `openai/gpt-oss-120b`, and `openai` (remove the stray backtick after
openai and any smart quotes).

116-118: ⚠️ Potential issue | 🟠 Major

Add concrete Route workaround steps (Service + Route).

The section explains the problem but still lacks executable steps/manifests, so users cannot complete external exposure from this guide.

Proposed content to add

 ## 2. Expose the Model Externally
 
 When deploying vllm with KServe using RawDeployment, it creates a **headless Service** (clusterIP: None). To expose the model externally, I needed to expose an OpenShift Route. But, OpenShift Routes cannot point to headless Services, so I needed a workaround to create a ClusterIP service. Using the product dashboard will let you do this too.
+
+Create a ClusterIP Service targeting the same pods:
+
+```yaml
+apiVersion: v1
+kind: Service
+metadata:
+  name: llama-3-3-70b-clusterip
+spec:
+  type: ClusterIP
+  selector:
+    serving.kserve.io/inferenceservice: llama-3-3-70b
+  ports:
+    - name: http1
+      port: 80
+      targetPort: 8080
+```
+
+Then create a Route to that ClusterIP Service:
+
+```bash
+oc create route edge llama-3-3-70b-route \
+  --service=llama-3-3-70b-clusterip \
+  --port=http1
+```
+
+Get the external host:
+
+```bash
+oc get route llama-3-3-70b-route -o jsonpath='{.spec.host}{"\n"}'
+```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@guide-local-agent-to-vllm-on-cluster.md` around lines 116 - 118, Add
concrete, executable steps to the "2. Expose the Model Externally" section:
include a ClusterIP Service manifest (name: llama-3-3-70b-clusterip) with
selector serving.kserve.io/inferenceservice: llama-3-3-70b and a port mapping
(name http1, port 80 -> targetPort 8080), then show the oc create route edge
command to create an OpenShift Route (name: llama-3-3-70b-route) pointing to
that service with --port=http1, and finally include the oc get route ...
jsonpath command to print the external host; place these concrete manifest and
commands right after the explanation about headless Services so users can apply
them directly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Around line 18-53: The markdown code fences for both YAML examples (the
ServingRuntime block containing "kind: ServingRuntime" and the InferenceService
block containing "kind: InferenceService") are missing language identifiers;
update each opening fence from ``` to ```yaml so the blocks are recognized as
YAML (apply the same change for the additional YAML block referenced later
around the InferenceService example).

---

Duplicate comments:
In `@guide-local-agent-to-vllm-on-cluster.md`:
- Line 57: Fix the malformed quoting in the parser example: replace the
mismatched smart quotes and the stray backtick so the example consistently uses
plain backticks and correct parser names — e.g., show
`--tool-call-parser=llama3_json`, then list parser names as `mistral`,
`openai/gpt-oss-120b`, and `openai` (remove the stray backtick after openai and
any smart quotes).
- Around line 116-118: Add concrete, executable steps to the "2. Expose the
Model Externally" section: include a ClusterIP Service manifest (name:
llama-3-3-70b-clusterip) with selector serving.kserve.io/inferenceservice:
llama-3-3-70b and a port mapping (name http1, port 80 -> targetPort 8080), then
show the oc create route edge command to create an OpenShift Route (name:
llama-3-3-70b-route) pointing to that service with --port=http1, and finally
include the oc get route ... jsonpath command to print the external host; place
these concrete manifest and commands right after the explanation about headless
Services so users can apply them directly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 12316f44-2b0f-4ed1-be83-1d116686b39d

📥 Commits

Reviewing files that changed from the base of the PR and between 2b950aa and cdb32f7.

📒 Files selected for processing (1)

guide-local-agent-to-vllm-on-cluster.md

coderabbitai · 2026-04-07T20:40:01Z

+```
+apiVersion: serving.kserve.io/v1alpha1
+kind: ServingRuntime
+metadata:
+  name: vllm-runtime
+spec:
+  containers:
+    - name: kserve-container
+      image: quay.io/modh/vllm #pin to version you need
+      args:
+        # --- Core (required) ---
+        - --port=8080                              # KServe expects this port
+        - --model=/mnt/models                      # KServe mounts weights here
+        - --served-model-name={{.Name}}            # matches InferenceService name
+
+        # --- Tool calling (required for agentic use cases) ---
+        - --enable-auto-tool-choice                # enables tool call detection
+        - --tool-call-parser=llama3_json            # model-specific
+
+        # --- Memory management (adjust per GPU) ---
+        - --max-model-len=16384                    # caps context window to reduce KV cache VRAM
+        - --gpu-memory-utilization=0.9             # fraction of VRAM vLLM will use (default 0.9)
+
+        # --- Multi-GPU (if needed) ---
+        # - --tensor-parallel-size=4               # split model across N GPUs
+
+        # --- Optional ---
+        # - --chat-template=/path/to/template.jinja  # only if model lacks built-in chat templates (see below)
+        # - --tool-parser-plugin=/path/to/plugin.py  # for custom parsers (e.g., Nemotron)
+      ports:
+        - containerPort: 8080
+          protocol: TCP
+  supportedModelFormats:
+    - name: vLLM
+      autoSelect: true
+```


⚠️ Potential issue | 🟡 Minor

Specify fenced code block languages for lint compliance.

Both YAML blocks are missing fence languages (MD040), which will keep markdownlint warninging in CI.

Proposed fix

-``` +```yaml apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime ... -``` +``` -``` +```yaml apiVersion: serving.kserve.io/v1beta1 kind: InferenceService ... -``` +```

Also applies to: 91-114

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@guide-local-agent-to-vllm-on-cluster.md` around lines 18 - 53, The markdown code fences for both YAML examples (the ServingRuntime block containing "kind: ServingRuntime" and the InferenceService block containing "kind: InferenceService") are missing language identifiers; update each opening fence from ``` to ```yaml so the blocks are recognized as YAML (apply the same change for the additional YAML block referenced later around the InferenceService example).

mpk-droid

added a comment. lmk what you think.

mpk-droid · 2026-04-09T21:41:18Z

@@ -0,0 +1,118 @@
+# Running an Agent Locally with a Model Served on vLLM on OpenShift AI


Thanks for documenting this — the content itself is useful. However, I think this doc targets the platform engineer persona (creating ServingRuntime/InferenceService CRs, tuning vLLM memory, exposing Routes), whereas this repo has so far focused on the AI engineer persona.

From the AI engineer's perspective, they just need to point their agent at a LlamaStack URL to access the model — the infrastructure behind it is abstracted away.

Before adding platform-focused content to this repo, I think we'd need to establish a clear pattern for how we organize and scope docs across personas. Otherwise we risk mixing concerns and making the repo harder to navigate for our primary audience.

Good point, I largely agree. I would say though that the separation isn't as strict.
The platform engineer would deploy the operator (Kserve, llama-stack), then an end-user (an AIE etc) would need to still create instances against that operator (i.e. the CRs - Serving Runtime/inference serving, lls etc).
Wdyt?

Ah, I see. Thanks for the clarification. In my mind, the Platform engineer would also create the CRs for the operator.

Drawing on my personal experience, i feel like a clear line between AI eng and Platform Eng would be that Plat eng handles all things cluster and exposes URIs for various resources and the AI Eng uses those resources to carry out some actions. I feel like this creates a cleaner boundaries in their responsibilities. Lets imagine that one of the resources is crash looping, with the line drawn as above, its clear that platform engineer would resolve it. wdyt?

…ed-hat-data-services#31, red-hat-data-services#51) - Fail fast with a clear error when image.repository is empty instead of rendering an invalid ":latest" image reference - Add checksum/secret annotation to pod template so pods auto-restart when secret values change (e.g. API_KEY rotation via make deploy) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jehlum11 added 2 commits April 7, 2026 15:19

Add files via upload

58be57a

Merge pull request #1 from jehlum11/point-your-local-agent-to-a-model…

2b950aa

…-on-cluster-with-vllm-and-kserve simple guide to cut across all agent templates

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Refactor guide for exposing model externally

cdb32f7

Removed unnecessary details

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

mpk-droid reviewed Apr 9, 2026

View reviewed changes

		####

		Note: In this case, I also used ' \--tool-call-parser=llama3\_json' \- each model will use different parsers. For example, Mistral-Small-4-119B-2603 will expect 'mistral', 'openai/gpt-oss-120b’ will expect ‘openai\`.

		## 2\. Expose the Model Externally

		When deploying vllm with KServe using RawDeployment, it creates a headless Service (clusterIP: None). To expose the model externally, I needed to expose an OpenShift Route. But, OpenShift Routes cannot point to headless Services, so I needed a workaround to create a ClusterIP service. Using the product dashboard will let you do this too.

		3. Update app code to point to vllm \+ KServe on OAI

		This was one of the bigger changes that I’ve captured here \- initially using Claude & Anthropic’s Agent SDK and changed it to langgraph/pure python agents for this exercise.

		@@ -0,0 +1,118 @@
		# Running an Agent Locally with a Model Served on vLLM on OpenShift AI

Conversation

jehlum11 commented Apr 7, 2026

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

mpk-droid left a comment

Choose a reason for hiding this comment

Uh oh!

mpk-droid Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

jehlum11 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mpk-droid Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading