-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Software engineering
We have a choice to make about how to develop m3 while maintaining a working m2. I can see three obvious possible approaches.
- Use different branches. E.g., copy current m2 to branch
milestone-2and start developing m3 on branchmain. - Use different Go packages. E.g., leave
pkg/controller/dual-pods/as it is, holding m2, and copy it into new packagepkg/controller/dual-pods-m3and develop from there. Remember, m3 changes not only the controller but all the recipes and scripts that use it. - Introduce an option into the existing controller. E.g., a command line flag
--use-launcherthat takes a Boolean value.
I currently favor approach 3, because I think it will be least disruptive. I expect there may be some on-going maintenance for m2, and I hate maintaining two copies of (more or less) the same code. Also, the change from m2 and m3 can be fairly smooth, in terms of the code in the controller, I think.
Changes to the dual-pods controller
Authoritative store of binding state
In m2 this is an annotation on the server-providing Pod. I think that this can remain.
Index into sleeping vLLM instances
In m2 this is an index maintained by the Pod informer. We can add an additional index into launched vLLM instances, in the controller's data structure. In nodeData, have a map from server-providing Pod name to *launcherData. Let launcherData have a map from nominal hash to last-used time. Use the nominal hash as the instance ID in the launcher. There is no need for a launcher to have multiple instances with the same nominal hash.
Other data/logic notes
Let a launcher-based server-providing Pod have an annotation that says it is such a thing. Let the controller's handler for Pod notifications queue a reference to the Pod when notified of such a thing.
Let launcherData have a boolean indicating whether the set of nominal hash is accurate.
Let the controller use the extension of generic controller that knows whether the initial load of objects has been processed.
When syncing a launcher-based server-providing Pod, create and populate the launcherData if it is not already present. If it is present but not accurate, update it to become accurate.
When it is time to create a new vLLM instance, use either the m2 code or the m3 code, as indicated by the option.
Before creating a launcher-based vLLM instance, check that every lancherData of the Pod has an accurate set of nominal hash. If not then enqueue a reference to the Pod and consider this a transient failure.
Each launcher has a personality. Create a new instance using only a launcher with the right personality.
Create a new launcher when needed. Delete an old launcher when its set of instances is empty and it has not been used for two minutes.
When the call to create or delete a launched instance fails with a networking problem, this controller considers this a transient failure and marks the launcherData as being uncertain about the correctness of the map of instances.