Describe the bug
Approximately 15 minutes after it is last used, the model gets unloaded with the following message:
Idle timeout reached. Unloading OmniVoice model to free VRAM.
However, after this occurs, the model doesn't start up again when requested and just displays this:
The only way to get it going again, is to restart the container.
It appears the backend process is still running and consuming VRAM though.
To reproduce
Steps to reproduce the behavior:
- Use OmniVoice model for anything
- Wait 15+ minutes
- Attempt to use the model again
Expected behavior
Either:
A) Start the model when required
B) Give the option to specify duration of timeout
C) Give the ability to disable timeout & unload (i.e. keep model always warm & available)
Screenshots / Logs
No logs for the loading of the model, as it doesn't happen.
Environment
- OS: Ubuntu 25.10
- Install method: Docker (manual build due to loopback issue)
- Version: v0.2.7
- GPU: NVIDIA RTX 5080
- RAM: 32GB
Additional context
Add any other context about the problem here.
Describe the bug
Approximately 15 minutes after it is last used, the model gets unloaded with the following message:
However, after this occurs, the model doesn't start up again when requested and just displays this:
The only way to get it going again, is to restart the container.
It appears the backend process is still running and consuming VRAM though.
To reproduce
Steps to reproduce the behavior:
Expected behavior
Either:
A) Start the model when required
B) Give the option to specify duration of timeout
C) Give the ability to disable timeout & unload (i.e. keep model always warm & available)
Screenshots / Logs
No logs for the loading of the model, as it doesn't happen.
Environment
Additional context
Add any other context about the problem here.