Skip to content

feat: Define provider specific gateway capabilities for llm-d#288

Open
ericdbishop wants to merge 7 commits into
kaito-project:mainfrom
ericdbishop:llmd-gateway-capabilities
Open

feat: Define provider specific gateway capabilities for llm-d#288
ericdbishop wants to merge 7 commits into
kaito-project:mainfrom
ericdbishop:llmd-gateway-capabilities

Conversation

@ericdbishop

Copy link
Copy Markdown
Member

Description

Adding gateway capabilities for the llm-d provider. Follow-up to #213. Currently scoped to a custom EPP config and EPP image for llm-d.

AI Prompt (Optional)

🤖 AI Prompt Used
Initial solution written with Copilot, prompt summary:

  Prompt 1 — Initial assessment request

  You set the scene: you're delegating InferencePool/EPP management to the llm-d provider via gateway capabilities, the same way it was done for Dynamo in PR #213. You'd added the Gateway field to llm-d's capabilities locally and wanted me to assess what else was needed. Asked me to make obvious changes directly but to flag anything I wasn't sure about for
  discussion.

  Prompt 2 — Design decisions

  After my list of 7 questions, you answered each:

   1. One EPP per ModelDeployment
   2. Name the constant LLMDSchedulerImage
   3. Ship a sensible default ConfigMap baked into the provider
   4. Wire --kv-events-config automatically (with your review afterward)
   5. Reuse RBAC where possible; no provider-specific code in gateway_reconciler.go — anything provider-specific belongs in providers/
   6. Research upstream whether enablePrefixCaching causes issues for llm-d
   7. Land in one PR, but flagged the key design tension: the existing GatewayCapabilities abstraction (pool name + namespace) fits Dynamo but not llm-d. llm-d doesn't actually need pool delegation — it just needs a custom EPP image. Asked me to code only up to a logical stopping point where we could reconsider the interface.

  Prompt 3 — Approval

  Acknowledged the EndpointPickerCapabilities struct isn't perfectly self-documenting for Dynamo's case but gave the go-ahead to proceed.

  Key constraints you established (worth remembering for future work)

   - No provider-specific branching in the gateway reconciler — providers express themselves through capability declarations
   - Reuse generic scaffolding wherever possible
   - Distinguish "full delegation" (Dynamo) from "EPP customization" (llm-d) as separate, independent extension points
   - One EPP per ModelDeployment
   - Research upstream behavior before wiring flags that might break things

AI Tool:
Copilot CLI with Claude Opus 4.7

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 📚 Documentation update
  • 🎨 UI/UX improvement
  • ♻️ Refactoring (no functional changes)
  • 🧪 Test update
  • 🔧 Build/CI configuration

Related Issues

Fixes #174

Changes Made

Testing

  • Unit tests pass (bun run test)
  • Manual testing performed
  • Tested with a Kubernetes cluster

Checklist

  • My code follows the project's style guidelines
  • I have run bun run lint
  • I have added tests that prove my fix/feature works
  • New and existing unit tests pass locally
  • I have updated documentation if needed
  • My changes generate no new warnings

Screenshots

Additional Notes

Signed-off-by: Eric Bishop <ericbish.dev@gmail.com>
Copilot AI review requested due to automatic review settings May 19, 2026 19:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds llm-d provider support for gateway/EPP customization by extending provider gateway capabilities so providers can override the controller-managed EPP image + plugin config, while keeping the controller responsible for the surrounding GAIE scaffolding (InferencePool + EPP resources).

Changes:

  • Introduces GatewayCapabilities.endpointPicker / EndpointPickerCapabilities in the API + CRD to allow provider-specific EPP image and config overrides.
  • Updates the gateway reconciler to distinguish “full pool delegation” (via InferencePoolNamePattern) from “EPP customization” (via EndpointPicker) and to apply EPP overrides during reconciliation/cleanup.
  • Updates the llm-d provider to declare EPP overrides (image + default config) and wires vLLM prefix caching / eager flags, with corresponding tests.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
providers/llmd/transformer.go Adds vLLM arg emission for prefix caching and eager execution; minor label formatting.
providers/llmd/transformer_test.go Adds unit test coverage for emitted prefix caching flags.
providers/llmd/status.go Minor formatting adjustment.
providers/llmd/controller_test.go Minor formatting cleanup in tests.
providers/llmd/config.go Declares llm-d gateway capabilities with provider-supplied EPP image + default EndpointPickerConfig YAML.
providers/llmd/config_test.go Adds assertions validating llm-d gateway capability fields (EPP override only, no pool delegation).
controller/internal/controller/gateway_reconciler.go Adds EPP override plumbing and narrows “provider-managed pool” semantics to InferencePoolNamePattern != "".
controller/internal/controller/gateway_reconciler_test.go Updates provider-managed cleanup test and adds tests for default vs provider-overridden EPP behavior.
controller/config/crd/bases/airunway.ai_inferenceproviderconfigs.yaml Extends CRD schema to include gateway.endpointPicker fields.
controller/api/v1alpha1/zz_generated.deepcopy.go Regenerates deep-copies for the new API types/fields.
controller/api/v1alpha1/inferenceproviderconfig_types.go Adds EndpointPickerCapabilities and documents the two gateway extension paths.
Files not reviewed (1)
  • controller/api/v1alpha1/zz_generated.deepcopy.go: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread controller/api/v1alpha1/inferenceproviderconfig_types.go Outdated
Signed-off-by: Eric Bishop <ericbish.dev@gmail.com>
Signed-off-by: Eric Bishop <ericbish.dev@gmail.com>
Copilot AI review requested due to automatic review settings June 15, 2026 14:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

t.Error("did not expect --enable-prefix-caching when EnablePrefixCaching=false")
}
}
}
Signed-off-by: Eric Bishop <ericbish.dev@gmail.com>
@ericdbishop ericdbishop marked this pull request as ready for review June 15, 2026 19:54
@ericdbishop ericdbishop requested a review from a team as a code owner June 15, 2026 19:54
Copilot AI review requested due to automatic review settings June 15, 2026 19:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

Comment thread providers/llmd/transformer.go
Signed-off-by: Eric Bishop <ericbish.dev@gmail.com>
Signed-off-by: Eric Bishop <ericbish.dev@gmail.com>
Copilot AI review requested due to automatic review settings June 15, 2026 20:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

Comment on lines +1337 to +1339
// TestGateway_EPP_OnlyImageOverride verifies image-only overrides leave the
// default ConfigMap in place (and vice versa via empty Image).
func TestGateway_EPP_OnlyImageOverride(t *testing.T) {
Comment on lines +96 to +98
// The two extension points are independent. A provider may use either, both,
// or neither. EndpointPicker is ignored when ManagesInferencePool is true (the
// provider is then expected to manage the EPP itself).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this feedback^

Comment on lines +96 to +98
// The two extension points are independent. A provider may use either, both,
// or neither. EndpointPicker is ignored when ManagesInferencePool is true (the
// provider is then expected to manage the EPP itself).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this feedback^

// named pool, reads its EndpointPickerRef, and wires HTTPRoute/ReferenceGrant
// accordingly. The controller does not create an InferencePool or EPP itself.
//
// 2. Endpoint Picker customization. When EndpointPicker is set, the controller

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that the EPP is still managed by the controller when EndpointPicker is set.

kind: EndpointPickerConfig
`,
`
if overrides != nil && overrides.ConfigData != "" {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some debug logs for these overrides

Comment thread docs/gateway.md
Some inference providers (e.g., NVIDIA Dynamo, llm-d) have native Gateway API Inference Extension support with their own InferencePool and Endpoint Picker (EPP). These providers deploy specialized EPPs with capabilities beyond the generic upstream EPP — for example, Dynamo's EPP uses **KV-cache-aware scoring** to route requests to endpoints with the highest KV cache hit probability.

When a provider declares gateway capabilities in its `InferenceProviderConfig`, the controller **delegates** InferencePool and/or EPP management to the provider instead of creating its own.
When a provider declares gateway capabilities in its `InferenceProviderConfig`, the controller adapts what it creates. Two extension points exist and can be used independently:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not independent since endpointPicker is ignored if managesInferencePool is set

Comment thread docs/gateway.md
|---|---|---|
| `managesInferencePool` | Controller waits for the provider's InferencePool to exist, then uses it as the HTTPRoute backend. Skips `reconcileInferencePool()` and `labelModelPods()`. | Controller creates and owns the InferencePool (default behavior). |
| `managesEPP` | Controller does nothing. | Controller deploys the generic upstream EPP. |
| `managesInferencePool: true` | Controller waits for the provider's InferencePool to exist, then uses it as the HTTPRoute backend. Skips `reconcileInferencePool()`, `reconcileEPP()`, and `labelModelPods()`. | Controller creates and owns the InferencePool and the EPP (default behavior). |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true here makes me think this is the default value. Is that correct? Maybe add another column for "default value"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Define provider specific capabilities for llm-d for EPP+InferencePool management

3 participants