A collection of examples showing how to build hybrid agentic workflows using the Microsoft Agent Framework, combining local Small Language Models (SLMs) and cloud-based Large Language Models (LLMs).
These demos illustrate different collaboration patterns to optimize for latency, privacy, and cost without sacrificing performance on complex tasks.
| Pattern Name | Description | Paper | Key Concept |
|---|---|---|---|
| 💻 SLM-Default, LLM-Fallback | Route queries to a local SLM first, escalating to cloud only if the local model's output fails verification. | arXiv:2510.03847 | Cost & Latency Optimization |
| 💻 Predictive Router | Use a local router to classify queries as "weak" or "strong". Route simple tasks to local models and complex ones to the cloud. | arXiv:2406.18665 | Dynamic Routing |
| 💻 MAKER Protocol | Decompose complex tasks using a cloud-based "Planner" and execute atomic steps using a local "Voting Solver" with convergence checks. | arXiv:2511.09030 | Task Decomposition |
| 💻 MINIONS Protocol | Decompose extraction tasks into parallel jobs for local "minions" to process on document chunks, synthesizing results in the cloud. | arXiv:2502.15964 | Local-Remote Map-Reduce |
| 💻 Chain of Agents | Process long contexts by chaining local SLMs to sequentially build context before final synthesis in the cloud. | arXiv:2406.02818 | Sequential Bucket Brigade |
The SLM role is played by Phi-4-mini-instruct running locally. Two interchangeable local inference backends are supported, selected via the
LOCAL_BACKENDenvironment variable:
| Backend | LOCAL_BACKEND value |
Use case |
|---|---|---|
| MLX | mlx (default) |
Apple Silicon (macOS) via agent-framework-mlx |
| Transformers | transformers |
Cross-platform (CUDA, MPS, CPU) via HuggingFace Transformers |
Demos use short model names (e.g. Phi-4-mini-instruct-4bit) that are automatically resolved to the correct backend-specific model path. You can also pass a fully-qualified HuggingFace model ID or override with the LOCAL_MODEL_PATH env var.
- Python 3.11+
- Azure CLI logged in (
az login) - For the MLX backend: macOS with Apple Silicon
- For the Transformers backend: any platform with PyTorch support (CUDA, MPS, or CPU)
cd python
cp .env.example .env # fill in your variables
pip install -r requirements.txt# default (MLX backend)
python 01-slm-default-llm-fallback/demo.py
# use the Transformers backend instead
LOCAL_BACKEND=transformers python 01-slm-default-llm-fallback/demo.pyAll five demos follow the same pattern:
python 01-slm-default-llm-fallback/demo.py
python 02-router-agent/demo.py
python 03-maker/demo.py
python 04-minions/demo.py
python 05-chain-of-agents/demo.py| Variable | Description | Default |
|---|---|---|
AZURE_AI_PROJECT_ENDPOINT |
Azure AI Foundry project endpoint | |
AZURE_AI_MODEL_DEPLOYMENT_NAME |
Deployment name for the LLM role in Azure AI Foundry | |
LOCAL_BACKEND |
Local inference backend (mlx or transformers) |
mlx (default) |
LOCAL_MODEL_PATH |
Override the HuggingFace model ID or local path for the SLM | per-backend default |
Uses
Microsoft.Agents.AI.Workflows(RC4) and OllamaSharp for local inference. All five patterns are ported 1-to-1 from the Python originals.
The .NET port supports three interchangeable inference backends, selected independently for the SLM and LLM roles via environment variables:
| Backend | SLM_BACKEND / LLM_BACKEND value |
Use case |
|---|---|---|
| Ollama | ollama (default for SLM) |
Local inference via Ollama's native API |
| OpenAI-compatible | openai-compatible |
Any server exposing /v1/chat/completions (LM Studio, vLLM, llama.cpp, …) |
| Azure AI Foundry | azure-ai (default for LLM) |
Hosted models on Azure AI Foundry via bearer-token auth |
- .NET 10 SDK
- At least one of:
Each project reads configuration from its Properties/launchSettings.json. A template is provided:
cd dotnet/src/<project>
cp ../../launchSettings.json.example Properties/launchSettings.json
# edit Properties/launchSettings.json and fill in your values
Properties/launchSettings.jsonis gitignored — your credentials stay local.
Open dotnet/HybridAgentDemos.slnx in Visual Studio / Rider, or run from the CLI:
dotnet run --project dotnet/src/01-SlmDefaultLlmFallback
dotnet run --project dotnet/src/02-RouterAgent
dotnet run --project dotnet/src/03-Maker
dotnet run --project dotnet/src/04-Minions
dotnet run --project dotnet/src/05-ChainOfAgentsAll variables are set in Properties/launchSettings.json (see dotnet/launchSettings.json.example).
Role selection
| Variable | Values | Default |
|---|---|---|
SLM_BACKEND |
ollama | openai-compatible | azure-ai |
ollama |
LLM_BACKEND |
ollama | openai-compatible | azure-ai |
azure-ai |
Ollama backend
| Variable | Description | Example |
|---|---|---|
OLLAMA_ENDPOINT |
Ollama server URL | http://localhost:11434 |
OLLAMA_SLM_MODEL |
Model name for the SLM role | phi4-mini |
OLLAMA_LLM_MODEL |
Model name for the LLM role | llama3.1:8b |
OpenAI-compatible backend
| Variable | Description | Example |
|---|---|---|
OPENAI_COMPATIBLE_ENDPOINT |
Server base URL (without /v1) |
http://localhost:1234 |
OPENAI_COMPATIBLE_SLM_MODEL |
Model name for the SLM role | phi-4-mini-instruct |
OPENAI_COMPATIBLE_LLM_MODEL |
Model name for the LLM role | llama3.1:8b |
Azure AI Foundry backend
| Variable | Description | Example |
|---|---|---|
AZURE_AI_FOUNDRY_ENDPOINT |
Azure AI Foundry OpenAI endpoint | https://<resource>.ai.azure.com/openai/v1/ |
AZURE_AI_SLM_DEPLOYMENT_NAME |
Deployment name for the SLM role | gpt-4o-mini |
AZURE_AI_LLM_DEPLOYMENT_NAME |
Deployment name for the LLM role | gpt-4.1 |