ag2 with local openai provider, such as Ollama or vLLM? #1861

ntsarb · 2025-05-29T20:53:21Z

ntsarb
May 29, 2025

Hello, is it possible to use ag2 with a local LLM provider that implements the OpenAI API, such as Ollama or vLLM?

pdaxt · 2026-01-05T01:45:47Z

pdaxt
Jan 5, 2026

Yes, AG2 works great with local OpenAI-compatible providers. Here's how to set it up:

vLLM Setup

from autogen import ConversableAgent

config_list = [{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed"  # vLLM doesn't require a key
}]

agent = ConversableAgent(
    name="local_agent",
    llm_config={"config_list": config_list}
)

Ollama Setup

config_list = [{
    "model": "llama3.1",
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama"
}]

Production Tips

vLLM is faster for multi-agent workloads due to continuous batching
Set max_tokens explicitly - some local models default to very short responses
For tool use, ensure your model supports function calling (Llama 3.1+, Mistral v0.3+)
Memory management - Run vLLM with --max-model-len set appropriately for your GPU

vllm serve meta-llama/Llama-3.1-8B-Instruct \\
    --max-model-len 8192 \\
    --gpu-memory-utilization 0.9

I've been running AG2 + vLLM for agent workflows and it works well. The main gotcha is ensuring your model handles system prompts correctly - some quantized models struggle with AG2's structured prompts.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ag2 with local openai provider, such as Ollama or vLLM? #1861

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ag2 with local openai provider, such as Ollama or vLLM? #1861

Uh oh!

ntsarb May 29, 2025

Replies: 1 comment

Uh oh!

pdaxt Jan 5, 2026

vLLM Setup

Ollama Setup

Production Tips

ntsarb
May 29, 2025

pdaxt
Jan 5, 2026