Skip to content

[Feature]: Need better control over main memory usage #408

@MikeSpreitzer

Description

@MikeSpreitzer

Feature Area

Other

Problem Statement

The maxSleepingInstances (

MaxSleepingInstances int32 `json:"maxSleepingInstances,omitempty"`
) of a LauncherConfig and the design of LauncherPopulationPolicy allow the authors of these objects to limit the amount of main memory that will be used by vllm instances. But there are two important gaps that remain.

(1) These limits are in terms of number of vllm instances, not main memory used. The amount of main memory used by a vllm instance can vary A LOT from model to model and depending on other parameters of vllm.

(2) These limits only apply within the scope of one LauncherConfig. If/when/while there are multiple LauncherConfig objects present, there is currently nothing that speaks to the sum of main memory usage over all the LauncherConfig objects (other than, of course, the sum of what is allowed for each LauncherConfig --- which I suspect is not enough control).

Proposed Solution

TBD

Alternatives Considered

No response

Willingness to Contribute

Yes, I can submit a PR

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions