Skip to content

Outline of changes from milestone 2 to 3 #154

@MikeSpreitzer

Description

@MikeSpreitzer

Software engineering

We have a choice to make about how to develop m3 while maintaining a working m2. I can see three obvious possible approaches.

  1. Use different branches. E.g., copy current m2 to branch milestone-2 and start developing m3 on branch main.
  2. Use different Go packages. E.g., leave pkg/controller/dual-pods/ as it is, holding m2, and copy it into new package pkg/controller/dual-pods-m3 and develop from there. Remember, m3 changes not only the controller but all the recipes and scripts that use it.
  3. Introduce an option into the existing controller. E.g., a command line flag --use-launcher that takes a Boolean value.

I currently favor approach 3, because I think it will be least disruptive. I expect there may be some on-going maintenance for m2, and I hate maintaining two copies of (more or less) the same code. Also, the change from m2 and m3 can be fairly smooth, in terms of the code in the controller, I think.

Changes to the dual-pods controller

Authoritative store of binding state

In m2 this is an annotation on the server-providing Pod. I think that this can remain.

Index into sleeping vLLM instances

In m2 this is an index maintained by the Pod informer. We can add an additional index into launched vLLM instances, in the controller's data structure. In nodeData, have a map from server-providing Pod name to *launcherData. Let launcherData have a map from nominal hash to last-used time. Use the nominal hash as the instance ID in the launcher. There is no need for a launcher to have multiple instances with the same nominal hash.

Other data/logic notes

Let a launcher-based server-providing Pod have an annotation that says it is such a thing. Let the controller's handler for Pod notifications queue a reference to the Pod when notified of such a thing.

Let launcherData have a boolean indicating whether the set of nominal hash is accurate.

Let the controller use the extension of generic controller that knows whether the initial load of objects has been processed.

When syncing a launcher-based server-providing Pod, create and populate the launcherData if it is not already present. If it is present but not accurate, update it to become accurate.

When it is time to create a new vLLM instance, use either the m2 code or the m3 code, as indicated by the option.

Before creating a launcher-based vLLM instance, check that every lancherData of the Pod has an accurate set of nominal hash. If not then enqueue a reference to the Pod and consider this a transient failure.

Each launcher has a personality. Create a new instance using only a launcher with the right personality.

Create a new launcher when needed. Delete an old launcher when its set of instances is empty and it has not been used for two minutes.

When the call to create or delete a launched instance fails with a networking problem, this controller considers this a transient failure and marks the launcherData as being uncertain about the correctness of the map of instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions