update scheduler to use latest GIE by nirrozenbaum · Pull Request #179 · llm-d/llm-d-inference-scheduler

nirrozenbaum · 2025-06-10T09:15:58Z

This PR updates llm-d-scheduler with the latest GIE design and extension points.
More specifically, this change removes the need to code two different schedulers (one for P, one for D) and instead is now making use of the new design to code P/D logic through the extension points.

some significant changes:

removed completely code that was duplicated from GIE upstream. this includes, removing internal package, removing health.go file under cmd package, and removing most of main.go which now includes only the plugins instantiation and running GIE upstream with llm-d plugins.
as stated in the intro, P/D logic is now coded through extension points. more concretely, we now have two SchedulerProfiles (one for P and one for D). we don't have anymore our own scheduler.
in case PD is disabled, only decode profile is configured with a SingleProfileHandler. no need to check on every request if PD is enabled or not but only at bootstrap.
when PD is enabled, use both profiles with PdProfileHandler that runs D and P iteratively and conditionally (runs P only if the non cached prefix is longer than the threshold).
uses a new PreRequest extension point to indicate the Prefill selected endpoint in a header.
updated Prefix scorer to use PostResponse instead of PostCycle. it's now called after the response is back and not before. pay attention that response is still not being used and this should be addressed in a follow up PR.
updated all imports and tests according to latest changes.

nirrozenbaum · 2025-06-17T13:45:22Z

fix #173
fix #174
fix #181

pkg/plugins/profile/pd-profile-handler.go

nirrozenbaum · 2025-06-18T09:40:07Z

xref kubernetes-sigs/gateway-api-inference-extension#1004

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

elevran

@nirrozenbaum thanks for the diligent work - this should take us much closer to latest GIE, removing considerable extra code.
Mostly style comments - some should be addressed here, others can go to a follow up (please open issues accordingly if using a follow up)

cmd/epp/main.go

elevran · 2025-06-21T08:34:17Z

cmd/epp/main.go

-
-	// Init HTTP server.
-	h, err := metricsHandlerWithAuthenticationAndAuthorization(cfg)
+	schedulerConfig, err := pd.CreatePDSchedulerConfig(ctx, pdConfig, prefixScorer)


nit: if this (CreatePDSchedulerConfig) is a new function, then the pd package name prefix means you can omit the PD in the function name. Ignore if this is an existing function name.

elevran · 2025-06-21T08:38:06Z

pkg/plugins/filter/by_labels.go

 	"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types"
 )

-// ByLabels filters out pods that do not match its label selector criteria


Q: what's the rationale for moving the type definition to L33 instead of keeping the original file location? Is this following some idiomatic Go guideline regarding ordering of types and functions in a file?

I'm not aware of any official/recommended go guidelines about ordering of things in a file.

what I was trying to do here is just being consistent across all plugin files.
I put the type assertions first so it's clear when one opens the file, which plugins are expected to be found in it.
then the "NewPlugin" function (and in near future also factory function), and then struct defiition and it's functions.
free functions are always at the end.

I was unable to find a formal idiomatic recommendation. Nor is there consistency in standard packages (looked at a bunch of net.http files). My personal preferences

Package declaration and imports

Constants and variables

Type declarations (structs, interfaces)

Compile-time interface assertions

Methods with receivers (grouped by type)

Constructor functions (e.g., NewType)

Free-standing functions

Helper or unexported functions

elevran · 2025-06-21T08:39:29Z

pkg/plugins/filter/passthrough.go

 	logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging"
 )

+// compile-time type assertion


Q: is a reason for the reorder of type definition and assertion?

This seems to be a recurring change in the files, I'd appreciate understanding the rationale behind the reorder.

My expectations (since godoc when used will order everything according to order of appearance in the file) is to guide the reader to understand the content by using this order:

types
type interface conformance checks
functions with receivers
constructors
free floating functions

see my other comment.. this is also the order of things in GIE upstream plugins btw.

I would not reorder to match GIE, unless we know that the GIE thought this out and has rationale...
While I'm ok with it, it's just that it created a larger PR in this case...

pkg/plugins/filter/random.go

pkg/plugins/profile/pd-profile-handler.go

elevran · 2025-06-21T08:58:51Z

pkg/plugins/profile/pd-profile-handler.go

+	// when a profile run fails its result value is nil. we need to check decode result before continuing to prefill
+	// check if all configured profiles have been executed, or if decode failed, no need to run more profiles.
+	if len(profiles) == len(profileResults) || profileResults[decode] == nil {
+		return map[string]*framework.SchedulerProfile{}


Q (unrelated to llm-d): was there a conclusion to the discussion regarding passing an empty array and/or bool to terminate profile selection iterations in GIE? This uses an empty array only, but I seem to recall (I could be wrong) we were leaning to an explicit bool in GIE.

yes. the conclusion was that we cannot use bool to understand whether additional iteration should be called or not. the reason is that we need to inspect the profile run result in order to decide and we don't know the answer before the profile is executed.
more concretely in P/D - when we return D profile, we cannot return bool if prefill profile should be run or not, cause it depends on the decode result (using prefix scorer we should check hit percentage..)

I remember it slightly differently: an explicit bool to call again or not can be used.
If you know ahead of time (e.g., single profile configured) - return that profile and set done=true. This saves the next call which would return an empty array of profiles.
If unsure (e.g., we need to inspect the profile run result in order to decide) - then return done=false and get the callback after to make a decision.
Returning the bool is in most cases one call less (e.g., return {"Decode", false}, receive a call back again and return {"Prefill", true} or {[], true}. Without the bool, and extra call returning {[]} is always required - even when both D and P have been executed (Decode, Prefill, []).

pkg/plugins/scorer/kvcache-aware.go

pkg/plugins/scorer/load_aware_scorer.go

pkg/scheduling/pd/scheduler.go

nirrozenbaum · 2025-06-22T07:53:34Z

@nirrozenbaum thanks for the diligent work - this should take us much closer to latest GIE, removing considerable extra code. Mostly style comments - some should be addressed here, others can go to a follow up (please open issues accordingly if using a follow up)

@elevran I started addressing the nit/structuring comments but it seems like a lot of changes. so haven't pushed those.
I really think we should first get a working version merged and then handle nits and styling in a follow up PR(s).

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

nirrozenbaum · 2025-06-22T09:46:27Z

opened s new issue to handle the comments- #192
if something is missing from the list feel free to add items to this issue

nirrozenbaum · 2025-06-22T10:05:01Z

adding the new design high level diagram to help understanding the code:

pkg/config/config.go

pkg/plugins/scorer/prefix_store.go

elevran

lgtm
2 more nit on imports (extra empty lines).
Everything else can be a follow up based on #192

kfirtoledo · 2025-06-22T13:12:50Z

cmd/epp/main.go

-		return err
-	}
+	// always initialize prefix scorer, which is used in the decision making of whether PD should be called or not.
+	prefixConfig := scorer.DefaultPrefixStoreConfig()


It shouldn't be just in case, pd is enabled or prefix like in line 58?

prefixScorer is passed to NewSchedulerConfig func.
we do a conditional check if to initialize the SchedulerConfig inside.
it's cleaner to just initialize the scorer in main, in case it's not used it will be garbage collected.. otherwise the conditional becomes not so readable.

kfirtoledo · 2025-06-22T13:15:12Z

cmd/epp/main.go

-
-	// Init HTTP server.
-	h, err := metricsHandlerWithAuthenticationAndAuthorization(cfg)
+	schedulerConfig, err := pd.CreatePDSchedulerConfig(ctx, pdConfig, prefixScorer)


We shouldn't have some default schedulerConfig in case pd is not enabled.

we have inside the function a function to create a "decode only" scheduler config in case PD is disabled.
in that case we also use the SingleProfileHandler from GIE upstream.

I think it will be clearer if it will be in the main pd disable we use the default fro GIE

kfirtoledo · 2025-06-22T13:20:42Z

pkg/scheduling/pd/scheduler.go

@@ -2,252 +2,130 @@ package pd

 import (


This code shouldn't be part of pd-profile-handler.go?

mmm I don't think so.
SchedulerConfig contains ProfileHandler + SchedulerProfile(s) (one or more profiles).
in this file we create the SchedulerConfig, and as part of it we create the ProfileHandler.
the created ProfileHandler depends on the PD configuration, if PD is enabled we use PDProfileHandler, if only D is used, we use SingleProfileHandler from GIE upstream.
have a look here:

llm-d-inference-scheduler/pkg/scheduling/pd/scheduler.go

Line 45 in e4ac4bc

pdProfileHandler := profile.NewPdProfileHandler(pdConfig.PDThreshold, prefixScorer)

and here:

llm-d-inference-scheduler/pkg/scheduling/pd/scheduler.go

Line 62 in e4ac4bc

return scheduling.NewSchedulerConfig(gieprofile.NewSingleProfileHandler(), map[string]*framework.SchedulerProfile{

I meant, why do we have this file? In the architecture scheduler == profile? I expect what belongs to pd_profile will be there and if we have common functions, they will be under profile.go

in previous code, we have scheduler for P and scheduler for D.
in the new design, we have SchedulerProfile for P and SchedulerProfile for D.
we do not instantiate a scheduler in llm-d, but only create a SchedulerConfig, and then let GIE handle the profiles run.
this might get a bit confusing. let's discuss this f2f. if you'd still think things are unclear then we might need to restructure some things to make it clearer.

the idea is that a Scheduler is a compilation of all configured scheduling logic, including sophisticated scheduling (such as a scheduling algo based on intent, as is with PD scheduling). A specific SchedulingProfile is one instance of a scheduling algorithm, PD requires multiple distinct scheduling algos, so multiple profiles. Hopefully that makes sense, but feedback is definitely appreciated if not.

Ok @nirrozenbaum and @kfswain, I understand the difference now, but I still think we should move
createSchedulerProfile and pluginsFromConfig to plugins/profile/profile.go.
And the pick and ProcessResults from PdProfileHandler should be part of the pd_scheduler.go

see the definition of ProfileHandler interface:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7df5d3dfdafa600a05f19d7147cecf68d25d2607/pkg/epp/scheduling/framework/plugins.go#L37-L50

there is a single interface that encapsulates the logic for picking profiles to run (iteratively) and at the end of all profiles execution to process the results.
This is a single interface with two functions.
PdProfileHandler implements this interface (this is a plugin with two extension points). therefore, this should remain in its own file.. hope it makes sense.

kfirtoledo · 2025-06-22T13:24:54Z

pkg/scheduling/pd/scheduler.go

+}

-	debugLog.Info("Scheduling to separate Prefill and Decode workers")
+func createSchedulerProfile(ctx context.Context, roleFilter framework.Filter, picker framework.Picker, configuredPlugins map[string]int,


The createSchedulerProfile and pluginsFromConfig are more general functions that I think should be ina common place

I agree about pluginsFromConfig, should be in a more common place.
don't you think createSchedulerProfile belongs under scheduler?

What is confusing is that we have a profile and schedulaer ( I think we should have just profile)

we don't have scheduler anymore.
we have SchedulerConfig. the config contains profile handler and one or more profiles.
I see that the file is still called scheduler.go, maybe this is confusing and should be renamed to scheduler_config.go?

pkg/scheduling/pd/scheduler_test.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

kfirtoledo

All comments can be handled in the issue - #192

nirrozenbaum force-pushed the scheduler-profiles branch 2 times, most recently from a3fb049 to 462c78b Compare June 11, 2025 11:54

nirrozenbaum changed the title ~~[WIP] update scheduler to use SchedulerProfiles and new scheduler design - DO NOT MERGE~~ [WIP] update scheduler to use SchedulerProfiles and new main - DO NOT MERGE Jun 12, 2025

This was referenced Jun 12, 2025

Remove duplicate health check code #181

Closed

Change Prefill and Decode filters to be based on a common filter #188

Merged

nirrozenbaum force-pushed the scheduler-profiles branch from 6705bbb to 4f3dea9 Compare June 17, 2025 11:22

nirrozenbaum mentioned this pull request Jun 17, 2025

Remove internal packages and reuse from GIE when available #173

Closed

elevran mentioned this pull request Jun 17, 2025

updade scheduler code to use scheduler profiles instead of two scheduler instances #174

Closed

elevran added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2025

kfswain reviewed Jun 17, 2025

View reviewed changes

pkg/plugins/profile/pd-profile-handler.go Outdated Show resolved Hide resolved

nirrozenbaum force-pushed the scheduler-profiles branch 2 times, most recently from e7aa400 to 10f1eb5 Compare June 18, 2025 21:23

nirrozenbaum changed the title ~~[WIP] update scheduler to use SchedulerProfiles and new main - DO NOT MERGE~~ update scheduler to use latest GIE Jun 19, 2025

updated to latest GIE version

4910571

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

nirrozenbaum force-pushed the scheduler-profiles branch from 12434a5 to 4910571 Compare June 19, 2025 20:56

remove internal package from Dockerfil

0aacb43

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

nirrozenbaum removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2025

nirrozenbaum requested review from ahg-g, elevran, kfirtoledo, kfswain, mayabar, shmuelk and vMaroon June 19, 2025 21:04

elevran requested changes Jun 21, 2025

View reviewed changes

nirrozenbaum added 2 commits June 22, 2025 11:00

imports order

f629bcc

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

imports

ee91455

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

nirrozenbaum requested a review from elevran June 22, 2025 08:04

nirrozenbaum mentioned this pull request Jun 22, 2025

reorganize pd related code under pd package #192

Closed

elevran reviewed Jun 22, 2025

View reviewed changes

pkg/config/config.go Outdated Show resolved Hide resolved

elevran reviewed Jun 22, 2025

View reviewed changes

pkg/plugins/scorer/prefix_store.go Show resolved Hide resolved

elevran previously approved these changes Jun 22, 2025

View reviewed changes

kfirtoledo reviewed Jun 22, 2025

View reviewed changes

Apply suggestions from code review

4d84886

Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

nirrozenbaum dismissed elevran’s stale review via 4d84886 June 22, 2025 13:33

elevran approved these changes Jun 22, 2025

View reviewed changes

kfirtoledo approved these changes Jun 22, 2025

View reviewed changes

nirrozenbaum merged commit 4551411 into llm-d:main Jun 22, 2025
2 checks passed

nirrozenbaum deleted the scheduler-profiles branch June 22, 2025 13:53

This was referenced Jun 22, 2025

fixed by labels link in the docs example markdown #194

Merged

removed unused passthrough and random filter/scorer #195

Merged

carlory mentioned this pull request Jul 2, 2025

Fix outdate debug info #216

Merged

Conversation

nirrozenbaum commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Jun 17, 2025

Uh oh!

Uh oh!

nirrozenbaum commented Jun 18, 2025

Uh oh!

elevran left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevran Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nirrozenbaum commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Jun 22, 2025

Uh oh!

nirrozenbaum commented Jun 22, 2025

Uh oh!

Uh oh!

Uh oh!

elevran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfswain Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfirtoledo Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum commented Jun 10, 2025 •

edited

Loading

elevran Jun 22, 2025 •

edited

Loading

nirrozenbaum commented Jun 22, 2025 •

edited

Loading

kfswain Jun 23, 2025 •

edited

Loading

kfirtoledo Jun 23, 2025 •

edited

Loading