Description
Describe the need of your request
Hello. First of all thanks for great plugin!
Now about problem. As you can guess normal code uses far above 2K tokens. With Ollama, yeah, it's possible to create custom model with predefined tokens bigger, but here is a serious problem: if to make it huge to fit any possible size, it will use cpu heavily due to huge size needed and it will be generally slow. But ollama api allows setting context manually each time. Your plugin already calculated attached files tokens size, so please add dynamic ollama models num_ctx value and maybe in preferences let user define most max num_ctx size. It would solve problem where context too low to deal with file, and at same time not waste cpu reserving context much higher then needed for specific task.
Normally it's handled over Ollama API
http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Why is the sky blue?", "options": { "num_ctx": 4096 } }'
So for example if context is low it would allow fast reply generation, if context huge like 40K tokes, then it'd be slow only for specific request an manageable to wait.
Proposed solution
Add options to api request like
http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Why is the sky blue?", "options": { "num_ctx": 4096 } }'
Where num_ctx is actual context size needed to work with files attached.
Additional context
No response