Any way to set larger context length when using Ollama models? #5254
Replies: 2 comments 1 reply
-
You can create new models on Ollama, with a larger context. https://ollama.readthedocs.io/en/modelfile/#format Create a Modelfile
Then create your model, derived from a model you have You can then use llama3.1-8b-instruct-q4_K_M-8k like other Ollama models. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response, @kb-. I want to avoid defining a new Ollama model if possible, so I apologize for not being more specific. I'd like to be able to set the context length (and ideally other options) at query/inference time. Ollama allows passing a Open WebUI, for example, takes advantage of this and allows you to set I'm wondering if there is a way to pass query-time generation |
Beta Was this translation helpful? Give feedback.
-
I have successfully set up autogen to use Ollama and create agents and teams. I am currently using Ollama as the inference engine behind a MagneticOneGroupChat team and it is working well.
However, when I look at the Ollama model resource usage, it is clear (based on RAM/vRAM usage) that it is only using the default context length of 2048, even though the relevant context extends far beyond that context length limit as the MagneticOne team continues to work and create additional output. There is no error thrown, but I have seen a couple instances where the team attempt to perform tasks they have already completed further up in the team "chat", which (when combined with the evidence from RAM/vRAM usage) leads me to believe that any context beyond 2048 tokens is being silently truncated.
Is there a way to set an explicit, increased context length when instantiating an OpenAIChatCompletionClient, Agent, or Team? (Relevant documentation for using Ollama with Autogen here)
Beta Was this translation helpful? Give feedback.
All reactions