Skip to content

Ollama dynamic num_ctx #996

Open
Open
@sapphirepro

Description

@sapphirepro

Describe the need of your request

Hello. First of all thanks for great plugin!

Now about problem. As you can guess normal code uses far above 2K tokens. With Ollama, yeah, it's possible to create custom model with predefined tokens bigger, but here is a serious problem: if to make it huge to fit any possible size, it will use cpu heavily due to huge size needed and it will be generally slow. But ollama api allows setting context manually each time. Your plugin already calculated attached files tokens size, so please add dynamic ollama models num_ctx value and maybe in preferences let user define most max num_ctx size. It would solve problem where context too low to deal with file, and at same time not waste cpu reserving context much higher then needed for specific task.

Normally it's handled over Ollama API
http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Why is the sky blue?", "options": { "num_ctx": 4096 } }'

So for example if context is low it would allow fast reply generation, if context huge like 40K tokes, then it'd be slow only for specific request an manageable to wait.

Proposed solution

Add options to api request like

http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Why is the sky blue?", "options": { "num_ctx": 4096 } }'
Where num_ctx is actual context size needed to work with files attached.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions