Skip to content

Conversation

@davidbreitgand
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1963

This is a design proposal PR, it does not introduce code changes.

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 7, 2025
@netlify
Copy link

netlify bot commented Dec 7, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 9b24a69
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/693c27b9e02e1c00099585ce
😎 Deploy Preview https://deploy-preview-1964--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: davidbreitgand
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 7, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @davidbreitgand. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 7, 2025
@davidbreitgand davidbreitgand changed the title Bbr pluggable framework proposal [WIP] Bbr pluggable framework proposal Dec 7, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2025
@davidbreitgand davidbreitgand marked this pull request as draft December 7, 2025 19:20
@davidbreitgand davidbreitgand changed the title [WIP] Bbr pluggable framework proposal BBR pluggable framework proposal Dec 12, 2025
@davidbreitgand davidbreitgand marked this pull request as ready for review December 12, 2025 14:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 12, 2025
@davidbreitgand
Copy link
Contributor Author

@srampal , @elevran , @nirrozenbaum could you please review?

Initial implementation: PR 1981


- Avoid monolithic architecture
- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two
- Enable organizing plugins into a topology for ordered and concurrent execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider grouping goals into sections (e.g., splitting goals into required and optional, generalizing into categories).
We might be able to support ordered and concurrent execution at a later stage, when a concrete requirement is defined.


### Overview

There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each plugin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugin's instance name. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.

nit: In EPP factories are a concrete implementation and not an interface. What is the reason for the factory abstraction?

### Overview

There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first implementation it might be worthwhile to treat it as a single plugin solely for "body based routing" functionality. Chains and DAGs might be desireable once the BBR changes to support body based processing and not just routing.
In EPP, we had initially supported a single profile and multiple profiles (e.g., for P/D disagg) were supported by wrapping multiple scheduler instances behind a single instance implementation.

RequiresFullParsing() bool

// Execute runs the plugin logic on the request body.
// A plugin's imnplementation logic CAN mutate the body of the message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A plugin's imnplementation logic CAN mutate the body of the message.
// A plugin's implementation logic CAN mutate the body of the message.

// which, say, stands for "select a best model for this request at minimal cost"
// A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid
// model name from the inventory of the backend models and also mutates the body accordingly
// In contrast,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence cut off?

String() string //human readable string for logging
}

// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this to future (once the need arises, will have concrete requirements for chain or DAG, parallel or sequential execution, etc.).

```go

const (
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
// A default plugin implementation of this plugin type will always be configured for request plugins chain.

nit: please close comment lines with . when the next comment starts a new sentence (upper case...)

const (
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
//Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind`
//MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter
// MUST start with an upper case letter, use CamelNotation, only aplhanumericals after the first letter


const (
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
//Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a limitation we have in EPP. Perhaps delay it until a more formal k8s Plugin type is defined in some AI gateway work stream?

)
```

### Current BBR reimplementation as BBRPlugin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to have (most of) the code duplicated. Suffice to say the current implementation can be made compliant with the proposed interface.

headers map[string]string,
mutatedBodyBytes []byte,
err error,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer single line if possible (headers map[string]string, mutatedBodyBytes []byte, err error).
also, avoid named returns unless it's required to disambiguate.

// placeholder for BBRPlugin constructors
type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type

// PluginRegistry defines operations for managing plugin factories and plugin instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rather wide interface. Consider removing functions or splitting into separate and indepenent interfaces.

type PluginRegistry interface {
RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors
RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first)
GetFactory(typeKey string) (PluginFactoryFunc, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider from the caller's point of view - which methods are needed in the abstractions. It would produce a more minimal interface (e.g., register-factory, create-instance, get-instance may be sufficient).

The pluggable framework will be implemented iteratively over several phases.

1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: for the initial task (various ways to do body based routing), a ModelSelector interface would have been sufficient. Why do we need multiple interfaces and what's the delta between interface in (1) and in (2)?


1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
1. Introduce shared struct (shared among the plugins of a plugins chain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does what?

1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
1. Introduce shared struct (shared among the plugins of a plugins chain)
1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggets making this out of scope for the implementation focusing on model selection in BBR and not other features.


## Open Questions

1. More elaborate shared memory architecture for the best performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider mentioning performance numbers (e.g. for multiple naive parsing?) or other use cases to motivate this


// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry
// The BBRPlugin instances
type PluginsChain interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: wide interface... which of these are needed by the caller?
Perhaps only support append initially and run to execute the chain?
Also, shouldn't Chain also implement Plugin?

Comment on lines +119 to +120
ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct
ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why needed (as opposed to just Run)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants