Skip to content

update scheduler to use latest GIE#179

Merged
nirrozenbaum merged 5 commits intollm-d:mainfrom
nirrozenbaum:scheduler-profiles
Jun 22, 2025
Merged

update scheduler to use latest GIE#179
nirrozenbaum merged 5 commits intollm-d:mainfrom
nirrozenbaum:scheduler-profiles

Conversation

@nirrozenbaum
Copy link
Collaborator

@nirrozenbaum nirrozenbaum commented Jun 10, 2025

This PR updates llm-d-scheduler with the latest GIE design and extension points.
More specifically, this change removes the need to code two different schedulers (one for P, one for D) and instead is now making use of the new design to code P/D logic through the extension points.

some significant changes:

  • removed completely code that was duplicated from GIE upstream. this includes, removing internal package, removing health.go file under cmd package, and removing most of main.go which now includes only the plugins instantiation and running GIE upstream with llm-d plugins.
  • as stated in the intro, P/D logic is now coded through extension points. more concretely, we now have two SchedulerProfiles (one for P and one for D). we don't have anymore our own scheduler.
  • in case PD is disabled, only decode profile is configured with a SingleProfileHandler. no need to check on every request if PD is enabled or not but only at bootstrap.
  • when PD is enabled, use both profiles with PdProfileHandler that runs D and P iteratively and conditionally (runs P only if the non cached prefix is longer than the threshold).
  • uses a new PreRequest extension point to indicate the Prefill selected endpoint in a header.
  • updated Prefix scorer to use PostResponse instead of PostCycle. it's now called after the response is back and not before. pay attention that response is still not being used and this should be addressed in a follow up PR.
  • updated all imports and tests according to latest changes.

@nirrozenbaum nirrozenbaum force-pushed the scheduler-profiles branch 2 times, most recently from a3fb049 to 462c78b Compare June 11, 2025 11:54
@nirrozenbaum nirrozenbaum changed the title [WIP] update scheduler to use SchedulerProfiles and new scheduler design - DO NOT MERGE [WIP] update scheduler to use SchedulerProfiles and new main - DO NOT MERGE Jun 12, 2025
@nirrozenbaum
Copy link
Collaborator Author

fix #173
fix #174
fix #181

@elevran elevran added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2025
@nirrozenbaum
Copy link
Collaborator Author

@nirrozenbaum nirrozenbaum force-pushed the scheduler-profiles branch 2 times, most recently from e7aa400 to 10f1eb5 Compare June 18, 2025 21:23
@nirrozenbaum nirrozenbaum changed the title [WIP] update scheduler to use SchedulerProfiles and new main - DO NOT MERGE update scheduler to use latest GIE Jun 19, 2025
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@nirrozenbaum nirrozenbaum removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2025
Copy link
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirrozenbaum thanks for the diligent work - this should take us much closer to latest GIE, removing considerable extra code.
Mostly style comments - some should be addressed here, others can go to a follow up (please open issues accordingly if using a follow up)


// Init HTTP server.
h, err := metricsHandlerWithAuthenticationAndAuthorization(cfg)
schedulerConfig, err := pd.CreatePDSchedulerConfig(ctx, pdConfig, prefixScorer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if this (CreatePDSchedulerConfig) is a new function, then the pd package name prefix means you can omit the PD in the function name. Ignore if this is an existing function name.

"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types"
)

// ByLabels filters out pods that do not match its label selector criteria
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: what's the rationale for moving the type definition to L33 instead of keeping the original file location? Is this following some idiomatic Go guideline regarding ordering of types and functions in a file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any official/recommended go guidelines about ordering of things in a file.

what I was trying to do here is just being consistent across all plugin files.
I put the type assertions first so it's clear when one opens the file, which plugins are expected to be found in it.
then the "NewPlugin" function (and in near future also factory function), and then struct defiition and it's functions.
free functions are always at the end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to find a formal idiomatic recommendation. Nor is there consistency in standard packages (looked at a bunch of net.http files). My personal preferences

  • Package declaration and imports
  • Constants and variables
  • Type declarations (structs, interfaces)
  • Compile-time interface assertions
  • Methods with receivers (grouped by type)
  • Constructor functions (e.g., NewType)
  • Free-standing functions
  • Helper or unexported functions

logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging"
)

// compile-time type assertion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is a reason for the reorder of type definition and assertion?

This seems to be a recurring change in the files, I'd appreciate understanding the rationale behind the reorder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectations (since godoc when used will order everything according to order of appearance in the file) is to guide the reader to understand the content by using this order:

types
type interface conformance checks
functions with receivers
constructors
free floating functions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment.. this is also the order of things in GIE upstream plugins btw.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not reorder to match GIE, unless we know that the GIE thought this out and has rationale...
While I'm ok with it, it's just that it created a larger PR in this case...

// when a profile run fails its result value is nil. we need to check decode result before continuing to prefill
// check if all configured profiles have been executed, or if decode failed, no need to run more profiles.
if len(profiles) == len(profileResults) || profileResults[decode] == nil {
return map[string]*framework.SchedulerProfile{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q (unrelated to llm-d): was there a conclusion to the discussion regarding passing an empty array and/or bool to terminate profile selection iterations in GIE? This uses an empty array only, but I seem to recall (I could be wrong) we were leaning to an explicit bool in GIE.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. the conclusion was that we cannot use bool to understand whether additional iteration should be called or not. the reason is that we need to inspect the profile run result in order to decide and we don't know the answer before the profile is executed.
more concretely in P/D - when we return D profile, we cannot return bool if prefill profile should be run or not, cause it depends on the decode result (using prefix scorer we should check hit percentage..)

Copy link
Collaborator

@elevran elevran Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember it slightly differently: an explicit bool to call again or not can be used.
If you know ahead of time (e.g., single profile configured) - return that profile and set done=true. This saves the next call which would return an empty array of profiles.
If unsure (e.g., we need to inspect the profile run result in order to decide) - then return done=false and get the callback after to make a decision.
Returning the bool is in most cases one call less (e.g., return {"Decode", false}, receive a call back again and return {"Prefill", true} or {[], true}. Without the bool, and extra call returning {[]} is always required - even when both D and P have been executed (Decode, Prefill, []).

@nirrozenbaum
Copy link
Collaborator Author

nirrozenbaum commented Jun 22, 2025

@nirrozenbaum thanks for the diligent work - this should take us much closer to latest GIE, removing considerable extra code. Mostly style comments - some should be addressed here, others can go to a follow up (please open issues accordingly if using a follow up)

@elevran I started addressing the nit/structuring comments but it seems like a lot of changes. so haven't pushed those.
I really think we should first get a working version merged and then handle nits and styling in a follow up PR(s).

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@nirrozenbaum
Copy link
Collaborator Author

opened s new issue to handle the comments- #192
if something is missing from the list feel free to add items to this issue

@nirrozenbaum
Copy link
Collaborator Author

adding the new design high level diagram to help understanding the code:
image

elevran
elevran previously approved these changes Jun 22, 2025
Copy link
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
2 more nit on imports (extra empty lines).
Everything else can be a follow up based on #192

return err
}
// always initialize prefix scorer, which is used in the decision making of whether PD should be called or not.
prefixConfig := scorer.DefaultPrefixStoreConfig()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be just in case, pd is enabled or prefix like in line 58?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefixScorer is passed to NewSchedulerConfig func.
we do a conditional check if to initialize the SchedulerConfig inside.
it's cleaner to just initialize the scorer in main, in case it's not used it will be garbage collected.. otherwise the conditional becomes not so readable.


// Init HTTP server.
h, err := metricsHandlerWithAuthenticationAndAuthorization(cfg)
schedulerConfig, err := pd.CreatePDSchedulerConfig(ctx, pdConfig, prefixScorer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have some default schedulerConfig in case pd is not enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have inside the function a function to create a "decode only" scheduler config in case PD is disabled.
in that case we also use the SingleProfileHandler from GIE upstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be clearer if it will be in the main pd disable we use the default fro GIE

@@ -2,252 +2,130 @@ package pd

import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code shouldn't be part of pd-profile-handler.go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm I don't think so.
SchedulerConfig contains ProfileHandler + SchedulerProfile(s) (one or more profiles).
in this file we create the SchedulerConfig, and as part of it we create the ProfileHandler.
the created ProfileHandler depends on the PD configuration, if PD is enabled we use PDProfileHandler, if only D is used, we use SingleProfileHandler from GIE upstream.
have a look here:

pdProfileHandler := profile.NewPdProfileHandler(pdConfig.PDThreshold, prefixScorer)

and here:

return scheduling.NewSchedulerConfig(gieprofile.NewSingleProfileHandler(), map[string]*framework.SchedulerProfile{

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why do we have this file? In the architecture scheduler == profile? I expect what belongs to pd_profile will be there and if we have common functions, they will be under profile.go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in previous code, we have scheduler for P and scheduler for D.
in the new design, we have SchedulerProfile for P and SchedulerProfile for D.
we do not instantiate a scheduler in llm-d, but only create a SchedulerConfig, and then let GIE handle the profiles run.
this might get a bit confusing. let's discuss this f2f. if you'd still think things are unclear then we might need to restructure some things to make it clearer.

Copy link
Collaborator

@kfswain kfswain Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is that a Scheduler is a compilation of all configured scheduling logic, including sophisticated scheduling (such as a scheduling algo based on intent, as is with PD scheduling). A specific SchedulingProfile is one instance of a scheduling algorithm, PD requires multiple distinct scheduling algos, so multiple profiles. Hopefully that makes sense, but feedback is definitely appreciated if not.

Copy link
Collaborator

@kfirtoledo kfirtoledo Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @nirrozenbaum and @kfswain, I understand the difference now, but I still think we should move
createSchedulerProfile and pluginsFromConfig to plugins/profile/profile.go.
And the pick and ProcessResults from PdProfileHandler should be part of the pd_scheduler.go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the definition of ProfileHandler interface:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7df5d3dfdafa600a05f19d7147cecf68d25d2607/pkg/epp/scheduling/framework/plugins.go#L37-L50

there is a single interface that encapsulates the logic for picking profiles to run (iteratively) and at the end of all profiles execution to process the results.
This is a single interface with two functions.
PdProfileHandler implements this interface (this is a plugin with two extension points). therefore, this should remain in its own file.. hope it makes sense.

}

debugLog.Info("Scheduling to separate Prefill and Decode workers")
func createSchedulerProfile(ctx context.Context, roleFilter framework.Filter, picker framework.Picker, configuredPlugins map[string]int,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The createSchedulerProfile and pluginsFromConfig are more general functions that I think should be ina common place

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about pluginsFromConfig, should be in a more common place.
don't you think createSchedulerProfile belongs under scheduler?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is confusing is that we have a profile and schedulaer ( I think we should have just profile)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have scheduler anymore.
we have SchedulerConfig. the config contains profile handler and one or more profiles.
I see that the file is still called scheduler.go, maybe this is confusing and should be renamed to scheduler_config.go?

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Copy link
Collaborator

@kfirtoledo kfirtoledo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments can be handled in the issue - #192

@nirrozenbaum nirrozenbaum merged commit 4551411 into llm-d:main Jun 22, 2025
2 checks passed
@nirrozenbaum nirrozenbaum deleted the scheduler-profiles branch June 22, 2025 13:53
@carlory carlory mentioned this pull request Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants