Skip to content

[Bug] Model is unloaded to save VRAM, but doesn't load again #86

@loryanstrant

Description

@loryanstrant

Describe the bug

Approximately 15 minutes after it is last used, the model gets unloaded with the following message:

Idle timeout reached. Unloading OmniVoice model to free VRAM.

However, after this occurs, the model doesn't start up again when requested and just displays this:

Image

The only way to get it going again, is to restart the container.

It appears the backend process is still running and consuming VRAM though.

To reproduce

Steps to reproduce the behavior:

  1. Use OmniVoice model for anything
  2. Wait 15+ minutes
  3. Attempt to use the model again

Expected behavior

Either:
A) Start the model when required
B) Give the option to specify duration of timeout
C) Give the ability to disable timeout & unload (i.e. keep model always warm & available)

Screenshots / Logs

No logs for the loading of the model, as it doesn't happen.

Environment

  • OS: Ubuntu 25.10
  • Install method: Docker (manual build due to loopback issue)
  • Version: v0.2.7
  • GPU: NVIDIA RTX 5080
  • RAM: 32GB

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions