Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
name: microsoft-foundry-local
description: "Build AI applications with Foundry Local — a lightweight runtime that downloads, manages, and serves language models entirely on-device via an OpenAI-compatible API. No cloud, no API keys. Routes to specific skills for setup, chat, RAG, agents, whisper, custom models, and evaluation. WHEN: foundry local, on-device AI, local LLM, foundry local overview, what can foundry do, foundry local help, local inference, offline AI, private AI, no cloud AI, foundry capabilities."
license: MIT
metadata:
author: Microsoft
version: "1.0.0"
---

# Foundry Local — Skill Hub

Foundry Local is an on-device AI runtime that serves language models via an OpenAI-compatible API at `http://localhost:<port>/v1`. No cloud services, API keys, or Azure subscriptions required.

## Skill Routing

| Need | Skill | Triggers |
|------|-------|----------|
| Install CLI, start service, manage models | **setup** | install, CLI, service start/stop, model download, port discovery |
| Chat completions (streaming, multi-turn) | **chat** | chat, streaming, conversation history, OpenAI SDK |
| Retrieval-Augmented Generation | **rag** | RAG, knowledge base, context injection, document grounding |
| Single & multi-agent workflows | **agents** | agent, multi-agent, orchestration, Agent Framework |
| Audio transcription with Whisper | **whisper** | whisper, transcribe, speech-to-text, audio |
| Compile custom Hugging Face models | **custom-models** | custom model, ONNX, Model Builder, Hugging Face, quantize |
| Test & evaluate LLM output quality | **evaluation** | evaluate, golden dataset, LLM judge, prompt comparison |

## Quick Reference

- **API key**: Always `"not-required"`
- **Base URL**: Dynamic port — use SDK to discover: `manager.get_endpoint()`
- **Supported languages**: Python, JavaScript (Node.js), C# (.NET 9)
- **Key SDKs**: `foundry-local-sdk` (Python/JS), `Microsoft.AI.Foundry.Local` (C#)

## Common Starting Points

### Install Foundry Local
```bash
# Windows
winget install Microsoft.FoundryLocal

# macOS
brew install foundrylocal
```

### List available models
```bash
foundry model list
```

### Start a model
```bash
foundry model run phi-4-mini
```

### Connect with Python
```python
from foundry_local import FoundryLocalManager

manager = FoundryLocalManager("phi-4-mini")
client = manager.get_openai_client()
```

### Connect with JavaScript
```javascript
import { FoundryLocalManager } from "foundry-local-sdk";

const manager = await FoundryLocalManager.start("phi-4-mini");
const client = manager.getOpenAIClient();
```

### Connect with C#
```csharp
using Microsoft.AI.Foundry.Local;
using OpenAI;

var manager = await FoundryLocalManager.StartServiceAsync();
var client = new OpenAIClient(new("not-required"),
new() { Endpoint = manager.Endpoint });
```

## Rules

1. Always use the SDK for endpoint discovery — never hard-code ports.
2. Set `api_key` to `"not-required"` — Foundry Local doesn't use API keys.
3. Route to the specific sub-skill for detailed patterns and troubleshooting.
4. All code runs entirely on-device — no network calls to cloud APIs.

## References

- [Foundry Local](https://learn.microsoft.com/en-us/azure/foundry-local/)
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
---
name: agents
description: "Build AI agents and multi-agent workflows with Foundry Local. Covers single agents with personas, multi-agent sequential pipelines, feedback loops, the Microsoft Agent Framework, and conversation history management. WHEN: foundry agent, AI agent local, multi-agent, agent orchestration, feedback loop, agent persona, system instructions, sequential pipeline, researcher writer editor, on-device agent, agent framework, FoundryLocalClient, AsAIAgent."
license: MIT
metadata:
author: Microsoft
version: "1.0.0"
---

# Foundry Local Agents & Multi-Agent Workflows

This skill provides patterns for building single agents and multi-agent workflows that run entirely on-device with Foundry Local.

## Triggers

Activate this skill when the user wants to:
- Create an AI agent with custom instructions and persona
- Build multi-agent pipelines (Researcher → Writer → Editor)
- Implement feedback loops between agents
- Use the Microsoft Agent Framework with Foundry Local
- Manage conversation history across agent interactions

## Rules

1. **Agents are stateless by default.** Multi-turn agents must explicitly maintain a `history` list.
2. **Use the Agent Framework when available** — it simplifies agent creation. Python uses `agent_framework_foundry_local`, C# uses `Microsoft.Agents.AI.OpenAI`.
3. **JavaScript has no high-level agent framework** — implement agents manually with OpenAI SDK + history management.
4. **Feedback loops need a retry limit** — prevent infinite loops with a max iteration count (typically 2-3).
5. For service setup, refer to **setup** skill.

---

## Single Agent — Using the Agent Framework

### Python (Recommended — Agent Framework)

```python
import asyncio
from agent_framework_foundry_local import FoundryLocalClient

async def main():
alias = "phi-4-mini"

# FoundryLocalClient handles service start, model download, and loading
client = FoundryLocalClient(model_id=alias)

# Create an agent with system instructions
agent = client.as_agent(
name="Joker",
instructions="You are good at telling jokes.",
)

# Non-streaming
result = await agent.run("Tell me a joke about a pirate.")
print(result)

# Streaming
async for chunk in agent.run("Tell me a joke about a programmer.", stream=True):
if chunk.text:
print(chunk.text, end="", flush=True)

asyncio.run(main())
```

### C# (Recommended — Agent Framework)

```csharp
using Microsoft.Agents.AI;

// After setting up manager, model, and OpenAI client (see setup)...
AIAgent joker = client
.GetChatClient(model.Id)
.AsAIAgent(
instructions: "You are good at telling jokes.",
name: "Joker"
);

// Non-streaming
var response = await joker.RunAsync("Tell me a joke about a pirate.");
Console.WriteLine(response);

// Streaming
await foreach (var chunk in joker.RunStreamingAsync("Tell me another joke."))
{
Console.Write(chunk.Text);
}
```

### JavaScript (Manual — No Agent Framework)

```javascript
class ChatAgent {
constructor(client, modelId, name, instructions) {
this.client = client;
this.modelId = modelId;
this.name = name;
this.history = [{ role: "system", content: instructions }];
}

async run(userMessage) {
this.history.push({ role: "user", content: userMessage });

const response = await this.client.chat.completions.create({
model: this.modelId,
messages: this.history,
temperature: 0.7,
max_tokens: 1024,
});

const reply = response.choices[0].message.content;
this.history.push({ role: "assistant", content: reply });
return reply;
}
}

// Usage
const joker = new ChatAgent(client, modelInfo.id, "Joker", "You are good at telling jokes.");
const joke = await joker.run("Tell me a joke about a pirate.");
```

---

## Multi-Agent Pipeline — Sequential Workflow

The canonical multi-agent pattern is a sequential pipeline where each agent's output feeds the next:

```
Topic → [Researcher] → Research Notes → [Writer] → Draft → [Editor] → Verdict
```

### Python

```python
import asyncio
from agent_framework_foundry_local import FoundryLocalClient

async def main():
client = FoundryLocalClient(model_id="phi-4-mini")

researcher = client.as_agent(
name="Researcher",
instructions=(
"You are a research assistant. When given a topic, provide a concise "
"collection of key facts as bullet points."
),
)

writer = client.as_agent(
name="Writer",
instructions=(
"You are a skilled blog writer. Using the research notes provided, "
"write a short, engaging blog post (3-4 paragraphs)."
),
)

editor = client.as_agent(
name="Editor",
instructions=(
"You are a senior editor. Review the blog post for clarity, grammar, "
"and factual consistency. Provide a verdict: ACCEPT or REVISE."
),
)

topic = "The history of renewable energy"

# Sequential pipeline
research = await researcher.run(f"Research this topic:\n{topic}")
draft = await writer.run(f"Write a blog post from these notes:\n\n{research}")
verdict = await editor.run(
f"Review this article.\n\nResearch notes:\n{research}\n\nArticle:\n{draft}"
)

asyncio.run(main())
```

### C#

```csharp
AIAgent researcher = chatClient.AsAIAgent(
name: "Researcher",
instructions: "You are a research assistant. Provide key facts as bullet points.");

AIAgent writer = chatClient.AsAIAgent(
name: "Writer",
instructions: "You are a skilled blog writer. Write a short blog post.");

AIAgent editor = chatClient.AsAIAgent(
name: "Editor",
instructions: "Review the blog post. Provide a verdict: ACCEPT or REVISE.");

var topic = "The history of renewable energy";

var research = await researcher.RunAsync($"Research this topic:\n{topic}");
var draft = await writer.RunAsync($"Write a blog post from these notes:\n\n{research}");
var verdict = await editor.RunAsync(
$"Review this article.\n\nResearch notes:\n{research}\n\nArticle:\n{draft}");
```

---

## Feedback Loop Pattern

Add a feedback loop where the Editor can reject the draft and trigger a rewrite:

```python
MAX_RETRIES = 2

for attempt in range(MAX_RETRIES + 1):
draft = await writer.run(f"Write a blog post from these notes:\n\n{research}")

verdict = await editor.run(
f"Review this article.\n\nResearch:\n{research}\n\nArticle:\n{draft}"
)

if "ACCEPT" in verdict.upper():
print("Article accepted!")
break
elif attempt < MAX_RETRIES:
print(f"Revising (attempt {attempt + 2})...")
research = await researcher.run(
f"The editor wants revisions:\n{verdict}\n\nOriginal topic:\n{topic}"
)
else:
print("Max retries reached — publishing best effort.")
```

---

## Agent Design Best Practices

| Practice | Rationale |
|----------|-----------|
| Give each agent a specific, focused persona | Broad instructions produce vague outputs |
| Include output format in instructions | "Organize as bullet points" or "Respond with ACCEPT or REVISE" |
| Pass context from previous agents explicitly | Agents don't share memory implicitly |
| Limit context passed between agents | Don't forward entire conversations — summarise |
| Set retry limits on feedback loops | Prevent infinite loops (2-3 retries is typical) |

---

## Production Pattern — Shared Configuration

For production apps (like the Zava Creative Writer), extract common configuration:

### Python (FastAPI service)
```python
# foundry_config.py — shared across all agents
from foundry_local import FoundryLocalManager

manager = FoundryLocalManager()
manager.start_service()

ALIAS = "phi-4-mini"
manager.load_model(ALIAS)

MODEL_ID = manager.get_model_info(ALIAS).id
ENDPOINT = manager.endpoint
API_KEY = manager.api_key
```

```python
# Each agent module imports the shared config
from foundry_config import MODEL_ID, ENDPOINT, API_KEY
```

---

## Key Packages

| Language | Package | Purpose |
|----------|---------|---------|
| Python | `agent-framework-foundry-local` | High-level agent abstraction with streaming |
| C# | `Microsoft.Agents.AI.OpenAI` | `AsAIAgent()` extension method |
| JavaScript | — | No framework; use OpenAI SDK directly |

---

## Cross-References

- For service setup, see **setup**
- For basic chat patterns, see **chat**
- For grounding agents with local data, see **rag**
- For testing agent quality, see **evaluation**
Loading
Loading