Skip to content

[Serve] Prewarm in multiplexed models #61649

@mbignotti

Description

@mbignotti

What happened + What you expected to happen

I saw mentioned in the multiplexed models example at doc/source/serve/tutorials/model_multiplexing_forecast/content/README.md the _prewarm method as a way to pre-load multiplexed models. I tried to implement it in a simple example but it seems to have no effect.

Is there an actual way to pre-load models in a multiplexed setting? Also, how would it work with multiple replicas? Would it be possible to pre-load different models in multiple replicas?

Thanks a lot!

Versions / Dependencies

ray[serve] 2.54.0
Ubuntu 24.04 LTS

Reproduction script

Please find a full reproducible example attached. Simply run make run-docker-all from the root.

ray-example.zip

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'tcommunity-backlogdocsAn issue or change related to documentationquestionJust a question :)serveRay Serve Related IssuestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions