update scheduler to use latest GIE#179
Conversation
a3fb049 to
462c78b
Compare
6705bbb to
4f3dea9
Compare
e7aa400 to
10f1eb5
Compare
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
12434a5 to
4910571
Compare
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
elevran
left a comment
There was a problem hiding this comment.
@nirrozenbaum thanks for the diligent work - this should take us much closer to latest GIE, removing considerable extra code.
Mostly style comments - some should be addressed here, others can go to a follow up (please open issues accordingly if using a follow up)
|
|
||
| // Init HTTP server. | ||
| h, err := metricsHandlerWithAuthenticationAndAuthorization(cfg) | ||
| schedulerConfig, err := pd.CreatePDSchedulerConfig(ctx, pdConfig, prefixScorer) |
There was a problem hiding this comment.
nit: if this (CreatePDSchedulerConfig) is a new function, then the pd package name prefix means you can omit the PD in the function name. Ignore if this is an existing function name.
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types" | ||
| ) | ||
|
|
||
| // ByLabels filters out pods that do not match its label selector criteria |
There was a problem hiding this comment.
Q: what's the rationale for moving the type definition to L33 instead of keeping the original file location? Is this following some idiomatic Go guideline regarding ordering of types and functions in a file?
There was a problem hiding this comment.
I'm not aware of any official/recommended go guidelines about ordering of things in a file.
what I was trying to do here is just being consistent across all plugin files.
I put the type assertions first so it's clear when one opens the file, which plugins are expected to be found in it.
then the "NewPlugin" function (and in near future also factory function), and then struct defiition and it's functions.
free functions are always at the end.
There was a problem hiding this comment.
I was unable to find a formal idiomatic recommendation. Nor is there consistency in standard packages (looked at a bunch of net.http files). My personal preferences
- Package declaration and imports
- Constants and variables
- Type declarations (structs, interfaces)
- Compile-time interface assertions
- Methods with receivers (grouped by type)
- Constructor functions (e.g., NewType)
- Free-standing functions
- Helper or unexported functions
| logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging" | ||
| ) | ||
|
|
||
| // compile-time type assertion |
There was a problem hiding this comment.
Q: is a reason for the reorder of type definition and assertion?
This seems to be a recurring change in the files, I'd appreciate understanding the rationale behind the reorder.
There was a problem hiding this comment.
My expectations (since godoc when used will order everything according to order of appearance in the file) is to guide the reader to understand the content by using this order:
types
type interface conformance checks
functions with receivers
constructors
free floating functions
There was a problem hiding this comment.
see my other comment.. this is also the order of things in GIE upstream plugins btw.
There was a problem hiding this comment.
I would not reorder to match GIE, unless we know that the GIE thought this out and has rationale...
While I'm ok with it, it's just that it created a larger PR in this case...
| // when a profile run fails its result value is nil. we need to check decode result before continuing to prefill | ||
| // check if all configured profiles have been executed, or if decode failed, no need to run more profiles. | ||
| if len(profiles) == len(profileResults) || profileResults[decode] == nil { | ||
| return map[string]*framework.SchedulerProfile{} |
There was a problem hiding this comment.
Q (unrelated to llm-d): was there a conclusion to the discussion regarding passing an empty array and/or bool to terminate profile selection iterations in GIE? This uses an empty array only, but I seem to recall (I could be wrong) we were leaning to an explicit bool in GIE.
There was a problem hiding this comment.
yes. the conclusion was that we cannot use bool to understand whether additional iteration should be called or not. the reason is that we need to inspect the profile run result in order to decide and we don't know the answer before the profile is executed.
more concretely in P/D - when we return D profile, we cannot return bool if prefill profile should be run or not, cause it depends on the decode result (using prefix scorer we should check hit percentage..)
There was a problem hiding this comment.
I remember it slightly differently: an explicit bool to call again or not can be used.
If you know ahead of time (e.g., single profile configured) - return that profile and set done=true. This saves the next call which would return an empty array of profiles.
If unsure (e.g., we need to inspect the profile run result in order to decide) - then return done=false and get the callback after to make a decision.
Returning the bool is in most cases one call less (e.g., return {"Decode", false}, receive a call back again and return {"Prefill", true} or {[], true}. Without the bool, and extra call returning {[]} is always required - even when both D and P have been executed (Decode, Prefill, []).
@elevran I started addressing the nit/structuring comments but it seems like a lot of changes. so haven't pushed those. |
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
|
opened s new issue to handle the comments- #192 |
| return err | ||
| } | ||
| // always initialize prefix scorer, which is used in the decision making of whether PD should be called or not. | ||
| prefixConfig := scorer.DefaultPrefixStoreConfig() |
There was a problem hiding this comment.
It shouldn't be just in case, pd is enabled or prefix like in line 58?
There was a problem hiding this comment.
prefixScorer is passed to NewSchedulerConfig func.
we do a conditional check if to initialize the SchedulerConfig inside.
it's cleaner to just initialize the scorer in main, in case it's not used it will be garbage collected.. otherwise the conditional becomes not so readable.
|
|
||
| // Init HTTP server. | ||
| h, err := metricsHandlerWithAuthenticationAndAuthorization(cfg) | ||
| schedulerConfig, err := pd.CreatePDSchedulerConfig(ctx, pdConfig, prefixScorer) |
There was a problem hiding this comment.
We shouldn't have some default schedulerConfig in case pd is not enabled.
There was a problem hiding this comment.
we have inside the function a function to create a "decode only" scheduler config in case PD is disabled.
in that case we also use the SingleProfileHandler from GIE upstream.
There was a problem hiding this comment.
I think it will be clearer if it will be in the main pd disable we use the default fro GIE
| @@ -2,252 +2,130 @@ package pd | |||
|
|
|||
| import ( | |||
There was a problem hiding this comment.
This code shouldn't be part of pd-profile-handler.go?
There was a problem hiding this comment.
mmm I don't think so.
SchedulerConfig contains ProfileHandler + SchedulerProfile(s) (one or more profiles).
in this file we create the SchedulerConfig, and as part of it we create the ProfileHandler.
the created ProfileHandler depends on the PD configuration, if PD is enabled we use PDProfileHandler, if only D is used, we use SingleProfileHandler from GIE upstream.
have a look here:
and here:
There was a problem hiding this comment.
I meant, why do we have this file? In the architecture scheduler == profile? I expect what belongs to pd_profile will be there and if we have common functions, they will be under profile.go
There was a problem hiding this comment.
in previous code, we have scheduler for P and scheduler for D.
in the new design, we have SchedulerProfile for P and SchedulerProfile for D.
we do not instantiate a scheduler in llm-d, but only create a SchedulerConfig, and then let GIE handle the profiles run.
this might get a bit confusing. let's discuss this f2f. if you'd still think things are unclear then we might need to restructure some things to make it clearer.
There was a problem hiding this comment.
the idea is that a Scheduler is a compilation of all configured scheduling logic, including sophisticated scheduling (such as a scheduling algo based on intent, as is with PD scheduling). A specific SchedulingProfile is one instance of a scheduling algorithm, PD requires multiple distinct scheduling algos, so multiple profiles. Hopefully that makes sense, but feedback is definitely appreciated if not.
There was a problem hiding this comment.
Ok @nirrozenbaum and @kfswain, I understand the difference now, but I still think we should move
createSchedulerProfile and pluginsFromConfig to plugins/profile/profile.go.
And the pick and ProcessResults from PdProfileHandler should be part of the pd_scheduler.go
There was a problem hiding this comment.
see the definition of ProfileHandler interface:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7df5d3dfdafa600a05f19d7147cecf68d25d2607/pkg/epp/scheduling/framework/plugins.go#L37-L50
there is a single interface that encapsulates the logic for picking profiles to run (iteratively) and at the end of all profiles execution to process the results.
This is a single interface with two functions.
PdProfileHandler implements this interface (this is a plugin with two extension points). therefore, this should remain in its own file.. hope it makes sense.
| } | ||
|
|
||
| debugLog.Info("Scheduling to separate Prefill and Decode workers") | ||
| func createSchedulerProfile(ctx context.Context, roleFilter framework.Filter, picker framework.Picker, configuredPlugins map[string]int, |
There was a problem hiding this comment.
The createSchedulerProfile and pluginsFromConfig are more general functions that I think should be ina common place
There was a problem hiding this comment.
I agree about pluginsFromConfig, should be in a more common place.
don't you think createSchedulerProfile belongs under scheduler?
There was a problem hiding this comment.
What is confusing is that we have a profile and schedulaer ( I think we should have just profile)
There was a problem hiding this comment.
we don't have scheduler anymore.
we have SchedulerConfig. the config contains profile handler and one or more profiles.
I see that the file is still called scheduler.go, maybe this is confusing and should be renamed to scheduler_config.go?
Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
kfirtoledo
left a comment
There was a problem hiding this comment.
All comments can be handled in the issue - #192

This PR updates llm-d-scheduler with the latest GIE design and extension points.
More specifically, this change removes the need to code two different schedulers (one for P, one for D) and instead is now making use of the new design to code P/D logic through the extension points.
some significant changes:
health.gofile under cmd package, and removing most ofmain.gowhich now includes only the plugins instantiation and running GIE upstream with llm-d plugins.SingleProfileHandler. no need to check on every request if PD is enabled or not but only at bootstrap.