Skip to content

Commit 4e12850

Browse files
authored
Merge pull request #254 from zdtsw-forking/sync/upstream-ff5f8eab
[sync] upstream llm-d/llm-d-router ff37a55 [2026-06-16]
2 parents 503f46c + 6f89a3f commit 4e12850

235 files changed

Lines changed: 11970 additions & 2017 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/ISSUE_TEMPLATE/new-release.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
name: New Release
33
about: Propose a new release
4-
title: Release v0.x.0
5-
labels: ''
4+
title: Release vX.Y.Z
5+
labels: kind/release
66
assignees: ''
77

88
---
@@ -49,7 +49,7 @@ This document defines the process for releasing llm-d-router.
4949

5050
### Create or Checkout branch
5151

52-
1. If you already have the repo cloned, ensure its up-to-date and your local branch is clean.
52+
1. If you already have the repo cloned, ensure it's up-to-date and your local branch is clean.
5353

5454
1. Release Branch Handling:
5555
- For a Release Candidate:
@@ -63,7 +63,7 @@ This document defines the process for releasing llm-d-router.
6363
A release branch should already exist. In this case, check out the existing branch:
6464

6565
```shell
66-
git checkout -b release-${MAJOR}.${MINOR} ${REMOTE}/release-${MAJOR}.${MINOR}
66+
git checkout release-${MAJOR}.${MINOR} ${REMOTE}/release-${MAJOR}.${MINOR}
6767
```
6868

6969
1. Push your release branch to the llm-d-router remote.
@@ -79,13 +79,13 @@ This document defines the process for releasing llm-d-router.
7979
For a release candidate:
8080

8181
```shell
82-
git tag -s -a v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} -m 'llm-d-router v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} Release Candidate'
82+
git tag -s -a v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} -m "llm-d-router v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} Release Candidate"
8383
```
8484

8585
For a major, minor or patch release:
8686

8787
```shell
88-
git tag -s -a v${MAJOR}.${MINOR}.${PATCH} -m 'llm-d-router v${MAJOR}.${MINOR}.${PATCH} Release'
88+
git tag -s -a v${MAJOR}.${MINOR}.${PATCH} -m "llm-d-router v${MAJOR}.${MINOR}.${PATCH} Release"
8989
```
9090

9191
1. Push the tag to the llm-d-router repo.
@@ -102,16 +102,17 @@ This document defines the process for releasing llm-d-router.
102102
git push ${REMOTE} v${MAJOR}.${MINOR}.${PATCH}
103103
```
104104

105-
1. Pushing the tag triggers CI action to build and publish the [EPP image] and [sidecar image] to the [ghcr registry].
106-
1. Test the steps in the tagged quickstart guide after the PR merges. TODO add e2e tests! <!-- link to an e2e tests once we have such one -->
105+
1. Pushing the tag triggers CI action to build and publish the EPP image (`ghcr.io/llm-d/llm-d-router-endpoint-picker`) and sidecar image (`ghcr.io/llm-d/llm-d-router-disagg-sidecar`) to the [ghcr registry].
106+
1. Verify the [CI release workflow] completed successfully before proceeding.
107+
1. Test the steps in the tagged quickstart guide after the PR merges.
107108

108109
### Create the release!
109110

110111
1. Create a [new release]:
111112
1. Choose the tag that you created for the release.
112-
1. Use the tag as the release title, i.e. `v0.1.0` refer to previous release for the content of the release body.
113+
1. Use the tag as the release title, e.g. `v0.1.0`.
113114
1. Click "Generate release notes" and preview the release body.
114-
1. Go to Gateway Inference Extension latest release and make sure to include the highlights in llm-d-router as well.
115+
1. Ensure the release body includes: highlights, breaking changes (if any), known issues, and upgrade steps.
115116
1. If this is a release candidate, select the "This is a pre-release" checkbox.
116117
1. If you find any bugs in this process, create an [issue].
117118

@@ -131,7 +132,6 @@ Use the following steps to announce the release.
131132

132133
[repo]: https://github.com/llm-d/llm-d-router
133134
[ghcr registry]: https://github.com/orgs/llm-d/packages?repo_name=llm-d-router
134-
[EPP image]: https://github.com/llm-d/llm-d-router/pkgs/container/llm-d-router-endpoint-picker
135-
[sidecar image]: https://github.com/llm-d/llm-d-router/pkgs/container/llm-d-router-disagg-sidecar
136135
[new release]: https://github.com/llm-d/llm-d-router/releases/new
137136
[issue]: https://github.com/llm-d/llm-d-router/issues/new/choose
137+
[CI release workflow]: https://github.com/llm-d/llm-d-router/actions/workflows/ci-release.yaml

.github/actions/docker-build-and-push/action.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ runs:
5757
tags: |
5858
${{ inputs.registry }}/${{ inputs.image-name }}:${{ inputs.tag }}
5959
${{ inputs.push == 'true' && inputs.prerelease != 'true' && format('{0}/{1}:latest', inputs.registry, inputs.image-name) || '' }}
60+
${{ inputs.commit-sha != '' && format('{0}/{1}:{2}', inputs.registry, inputs.image-name, inputs.commit-sha) || '' }}
6061
build-args: |
6162
LDFLAGS=-s -w
6263
COMMIT_SHA=${{ inputs.commit-sha || 'unknown' }}

.tekton/llm-d-inference-scheduler-pull-request.yaml

Lines changed: 0 additions & 52 deletions
This file was deleted.

.tekton/llm-d-inference-scheduler-push.yaml

Lines changed: 0 additions & 46 deletions
This file was deleted.

AGENTS.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,35 @@ llm-d Router. Go service that routes inference requests to model-serving pods vi
3939
- State each fact once, in its canonical location. Do not duplicate across struct docs, prose, tables, inline comments, and examples.
4040
- Do not use Unicode symbols or special characters in general, unless explicitly requested.
4141

42+
### Logging
43+
44+
The codebase uses `go-logr` via controller-runtime. Verbosity constants are defined in `pkg/common/observability/logging` (`DEFAULT=2`, `VERBOSE=3`, `DEBUG=4`, `TRACE=5`).
45+
46+
**Level conventions:**
47+
48+
- `logger.Info(...)` for once-per-request operational signals.
49+
- `logger.V(logging.DEBUG).Info(...)` for per-item or per-loop signals that fire multiple times per request.
50+
- `logger.V(logging.TRACE).Info(...)` for detailed state transitions (cache operations, index updates).
51+
- `logger.Error(err, "msg", ...)` for recoverable errors that carry an underlying `error` value.
52+
53+
**Use named constants, not bare integers:**
54+
55+
```go
56+
// wrong
57+
logger.V(4).Info("running protocol", ...)
58+
59+
// correct
60+
logger.V(logging.DEBUG).Info("running protocol", ...)
61+
```
62+
63+
**Guard expensive log construction:**
64+
65+
```go
66+
if v := logger.V(logging.DEBUG); v.Enabled() {
67+
v.Info("payload details", "data", expensiveSerialization())
68+
}
69+
```
70+
4271
## Git workflow
4372

4473
- DCO sign-off is required. Use `git commit -s`.

DEVELOPMENT.md

Lines changed: 88 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ Documentation for developing the llm-d Router.
4040
- [Environment Configuration](#environment-configuration)
4141
- [Deploying Changes](#deploying-changes)
4242
- [Cleanup Environment](#cleanup-environment)
43+
- [Logging](#logging)
44+
- [Change log verbosity](#change-log-verbosity)
45+
- [Add logs](#add-logs)
46+
- [Passing Logger Around](#passing-logger-around)
4347
- [Submitting Changes](#submitting-changes)
4448
- [Scope](#scope)
4549
- [Presubmit](#presubmit)
@@ -169,14 +173,15 @@ PROM_ENABLED=true KIND_PROM_HOST_PORT=30091 make env-dev-kind
169173

170174
### Grafana Dashboard
171175

172-
The upstream [Inference Gateway dashboard] covers EPP, inference pool, and vLLM metrics.
176+
The bundled [Inference Gateway dashboard] covers EPP metrics across the inference
177+
pool, inference objective, and flow control layers.
173178

174179
Add a Prometheus datasource at `http://localhost:30090`, then import the JSON via
175180
**Dashboards > New > Import**. See the
176181
[Grafana installation docs](https://grafana.com/docs/grafana/latest/setup-grafana/installation/)
177182
for setup.
178183

179-
[Inference Gateway dashboard]:https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/dashboards/inference_gateway.json
184+
[Inference Gateway dashboard]:deploy/grafana/inference_gateway.json
180185

181186
> [!NOTE]
182187
> For significant customization beyond the standard deployment, use the `deploy/components`
@@ -877,6 +882,87 @@ helm uninstall kgateway-crds -n kgateway-system
877882
For more details, see the Gateway API Inference Extension
878883
[getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/).
879884

885+
## Logging
886+
887+
We use `logr.Logger` interface for logging everywhere.
888+
The logger instance is loaded from `context.Context` or passed around as an argument directly.
889+
This is aligned with contextual logging as explained in [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md).
890+
891+
In other words, we explicitly don't use `klog` global logging calls.
892+
Using `klog` log value helpers like `klog.KObj` is just fine.
893+
894+
### Change log verbosity
895+
896+
We generally follow the [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md), which states "the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4)".
897+
898+
To configure logging verbosity, specify the `v` flag such as `--v=2`.
899+
900+
If `--v` is not set explicitly, the default verbosity is V(2) (`DEFAULT`).
901+
### Add logs
902+
903+
The [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) have the following definitions:
904+
905+
- `logger.V(0).Info` = `logger.Info` - Generally useful for this to **always** be visible to a cluster operator
906+
- `logger.V(1).Info` - A reasonable default log level if you don't want verbosity.
907+
- `logger.V(2).Info` - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
908+
- `logger.V(3).Info` - Extended information about changes
909+
- `logger.V(4).Info` - Debug level verbosity
910+
- `logger.V(5).Info` - Trace level verbosity
911+
912+
We choose to simplify to the following 4 common levels.
913+
914+
```go
915+
const (
916+
DEFAULT = 2
917+
VERBOSE = 3
918+
DEBUG = 4
919+
TRACE = 5
920+
)
921+
```
922+
923+
The guidelines are written in the context of a k8s controller. Our [epp](pkg/epp/) does more things such as handling requests and scraping metrics, therefore we adapt the guidelines as follows:
924+
925+
1. The server startup process and configuration.
926+
927+
- `logger.Info` Logging at the `V(0)` verbosity level is generally welcome here as this is only logged once at startup, and provides useful info for debugging.
928+
929+
2. Reconciler loops. The reconciler loops watch for CR changes such as the `InferenceObjective` CR. And given changes in these CRs significantly affect the behavior of the extension, we recommend using `V(DEFAULT)` verbosity level as default, and sparsely use higher verbosity levels.
930+
931+
- `logger.V(DEFAULT)`
932+
- Default log level in the reconcilers.
933+
- Information about config (listening on X, watching Y)
934+
- Errors that repeat frequently that relate to conditions that can be corrected (e.g., inference model not initialized yet)
935+
- System state changing (adding/removing objects in the data store)
936+
- `logger.V(VERBOSE)` and above: Use your best judgement.
937+
938+
3. Inference request handling. These requests are expected to be much higher volume than the control flow in the reconcilers and therefore we should be mindful of log spamming. We recommend using v=2 to log important info about a request, such as the HTTP response code, and higher verbosity levels for less important info.
939+
940+
- `logger.V(DEFAULT)`
941+
- Logging the status code of an HTTP request
942+
- Important decision making such as picking the target model, target pod
943+
- `logger.V(VERBOSE)`
944+
- Detailed request scheduling algorithm operations, such as running the filtering logic
945+
- `logger.V(DEBUG)` and above: Use your best judgement.
946+
947+
4. Metric scraping loops. These loops run at a very high frequency, and logs can be very spammy if not handled properly.
948+
949+
- `logger.V(TRACE)`
950+
- Transient errors/warnings, such as failure to get response from a pod.
951+
- Important state changes, such as updating a metric.
952+
953+
5. Misc
954+
1. Periodic (every 5s) debug loop which prints the current pods and metrics.
955+
- `logger.V(DEFAULT).Error` If the metrics are not fresh enough, which indicates an error occurred during the metric scraping loop.
956+
- `logger.V(DEBUG)`
957+
- This is very important to debug the request scheduling algorithm, and yet not spammy compared to the metric scraping loop logs.
958+
959+
### Passing Logger Around
960+
961+
You can pass around a `context.Context` that contains a logger or a `logr.Logger` instance directly.
962+
You need to make the call which one to use. Passing a `context.Context` is more standard, on the other hand you then need to call `log.FromContext` everywhere.
963+
964+
As `logger.V` calls are cumulative, i.e. `logger.V(2).V(3)` results in `logger.V(5)`, a logger should be passed around with no verbosity level set so that `logger.V(DEFAULT)` actually uses `DEFAULT` verbosity level.
965+
880966
## Submitting Changes
881967

882968
Read the [llm-d organization contributing guide](https://github.com/llm-d/llm-d/blob/main/CONTRIBUTING.md)

cmd/epp/runner/runner.go

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ import (
7272
srcmodels "github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/datalayer/source/models"
7373
sourcenotifications "github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/datalayer/source/notifications"
7474
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/flowcontrol/fairness/globalstrict"
75+
programaware "github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/flowcontrol/fairness/program-aware"
7576
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/flowcontrol/fairness/roundrobin"
7677
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/flowcontrol/ordering/edf"
7778
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/flowcontrol/ordering/fcfs"
@@ -99,6 +100,7 @@ import (
99100
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp"
100101
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/scheduling/filter/bylabel"
101102
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/scheduling/filter/prefixcacheaffinity"
103+
sessionaffinityfilter "github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/scheduling/filter/sessionaffinity"
102104
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/scheduling/filter/sloheadroomtier"
103105
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/scheduling/picker/maxscore"
104106
"github.com/llm-d/llm-d-router/pkg/epp/framework/plugins/scheduling/picker/random"
@@ -217,7 +219,7 @@ func (r *Runner) Run(ctx context.Context) error {
217219
logutil.InitLogging(&opts.ZapOptions)
218220

219221
if opts.Tracing {
220-
shutdown, err := tracing.InitTracing(ctx, setupLog, "llm-d-router/epp")
222+
shutdown, err := tracing.InitTracing(ctx, setupLog, "llm-d-epp")
221223
if err != nil {
222224
return fmt.Errorf("failed to init tracing %w", err)
223225
}
@@ -400,6 +402,7 @@ func (r *Runner) setup(ctx context.Context, cfg *rest.Config, opts *runserver.Op
400402
PriorityBandControlPlane: priorityBandControlPlane,
401403
GRPCMaxRecvMsgSize: opts.GRPCMaxRecvMsgSize,
402404
GRPCMaxSendMsgSize: opts.GRPCMaxSendMsgSize,
405+
EnableGRPCStreamMetrics: opts.EnableGRPCStreamMetrics,
403406
}
404407

405408
if err := serverRunner.SetupWithManager(mgr); err != nil {
@@ -483,6 +486,7 @@ func (r *Runner) registerInTreePlugins() {
483486
fwkplugin.Register(bylabel.EncodeRoleType, bylabel.EncodeRoleFactory)
484487
fwkplugin.Register(bylabel.DecodeRoleType, bylabel.DecodeRoleFactory)
485488
fwkplugin.Register(bylabel.PrefillRoleType, bylabel.PrefillRoleFactory)
489+
fwkplugin.Register(sessionaffinityfilter.SessionAffinityType, sessionaffinityfilter.Factory)
486490

487491
// dataparallel profile handler
488492
fwkplugin.Register(dataparallel.DataParallelProfileHandlerType, dataparallel.ProfileHandlerFactory)
@@ -522,6 +526,7 @@ func (r *Runner) registerInTreePlugins() {
522526
// Flow Control plugins
523527
fwkplugin.Register(globalstrict.GlobalStrictFairnessPolicyType, globalstrict.GlobalStrictFairnessPolicyFactory)
524528
fwkplugin.Register(roundrobin.RoundRobinFairnessPolicyType, roundrobin.RoundRobinFairnessPolicyFactory)
529+
fwkplugin.Register(programaware.ProgramAwarePluginType, programaware.ProgramAwarePluginFactory)
525530
fwkplugin.Register(fcfs.FCFSOrderingPolicyType, fcfs.FCFSOrderingPolicyFactory)
526531
fwkplugin.Register(edf.EDFOrderingPolicyType, edf.EDFOrderingPolicyFactory)
527532
fwkplugin.Register(slodeadline.SLODeadlineOrderingPolicyType, slodeadline.SLODeadlineOrderingPolicyFactory)
@@ -943,6 +948,7 @@ func (r *Runner) runWithFileDiscovery(ctx context.Context, opts *runserver.Optio
943948
SaturationDetector: eppConfig.SaturationDetector,
944949
GRPCMaxRecvMsgSize: opts.GRPCMaxRecvMsgSize,
945950
GRPCMaxSendMsgSize: opts.GRPCMaxSendMsgSize,
951+
EnableGRPCStreamMetrics: opts.EnableGRPCStreamMetrics,
946952
}
947953

948954
r.customCollectors = append(r.customCollectors, collectors.NewInferencePoolMetricsCollector(ds))

0 commit comments

Comments
 (0)