[Feature]: Need better control over main memory usage

### Feature Area

Other

### Problem Statement

The `maxSleepingInstances` (https://github.com/llm-d-incubation/llm-d-fast-model-actuation/blob/da4a87a444a8289d7e9ecdaf6d8adbf7e4bcefd4/api/fma/v1alpha1/launcherconfig_types.go#L32) of a LauncherConfig and the design of LauncherPopulationPolicy allow the authors of these objects to limit the amount of main memory that will be used by vllm instances. But there are two important gaps that remain.

(1) These limits are in terms of number of vllm instances, not main memory used. The amount of main memory used by a vllm instance can vary A LOT from model to model and depending on other parameters of vllm.

(2) These limits only apply within the scope of one LauncherConfig. If/when/while there are multiple LauncherConfig objects present, there is currently nothing that speaks to the sum of main memory usage over all the LauncherConfig objects (other than, of course, the sum of what is allowed for each LauncherConfig --- which I suspect is not enough control).

### Proposed Solution

TBD

### Alternatives Considered

_No response_

### Willingness to Contribute

Yes, I can submit a PR

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Need better control over main memory usage #408

Feature Area

Problem Statement

Proposed Solution

Alternatives Considered

Willingness to Contribute

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Need better control over main memory usage #408

Description

Feature Area

Problem Statement

Proposed Solution

Alternatives Considered

Willingness to Contribute

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions