Skip to content

important / pinned models #1630

@ckuethe

Description

@ckuethe

Feature Description

Allow a model or models to be marked as important or pinned, and thus less likely to be evicted.

Use Case / Motivation

If I'm simultaneously letting qwen-coder-next do a big refactoring and use its full 256k context, and playing with less important text and image processing at the same time, I'd like to prevent QCN from being automatically evicted.

Perhaps there are a few small models cooperating on a task that should remain resident together.

Platform Relevance

All platforms

Additional Context

We're fairly comfortable with either binary flags or adjustable ranges to configure resource management: mlock(2) can be used to request an address range not be swapped. chattr(1) can be used to set an immutable bit on a file that must be removed before the superuser can delete it. vm.swappiness can be used to control how likely the kernel is to swap to disk. Network interfaces can have auto-connect priorities...

In the same way, a binary llmlock operation could lock the model so that lemonade wouldn't evict that runner and would fail to load other models if there isn't enough room. For more nuanced control, an importance or priority property could be set so that an model wouldn't be evicted until all less important models were unloaded first, and only if a higher priority model was requested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions