Skip to content

v1.1.0 Foundry Local

Choose a tag to compare

@prathikr prathikr released this 05 May 20:34
598fc19

πŸš€ Foundry Local v1.1.0 Release Notes

We're excited to announce Foundry Local v1.1.0 β€” packed with new capabilities for on-device AI! This release brings expanded platform support, new model types, and performance improvements across the board.


πŸ†• What's New

🎯 .NET netstandard2.0 / net8.0 Support

The C# SDK now targets both net8.0 and netstandard2.0, broadening compatibility to .NET Framework 4.6.1+, .NET Core 2.0+, Xamarin, Unity, and more. Ship on-device AI to virtually any .NET application!

<PackageReference Include="Microsoft.AI.Foundry.Local" Version="1.1.0" />

πŸ“– C# SDK Documentation


πŸŽ™οΈ Live Audio Transcription

Real-time speech-to-text is here! Stream microphone audio directly to the SDK and receive transcription results as they arrive β€” no cloud round-trips, no latency. Built on the Nemotron ASR model with an OpenAI Realtime-compatible API surface.

Python
audio_client = model.get_audio_client()
session = audio_client.create_live_transcription_session()
session.settings.sample_rate = 16000
session.settings.channels = 1
session.settings.language = "en"

session.start()

# Push audio
session.append(pcm_bytes)

# Read results (typically on a background thread)
for result in session.get_stream():
    print(result.content[0].text)        # transcribed text
    print(result.is_final)               # True for final results

session.stop()

πŸ“– Python live audio transcription sample

JavaScript
const audioClient = model.createAudioClient();
const session = audioClient.createLiveTranscriptionSession();
session.settings.sampleRate = 16000;
session.settings.channels = 1;
session.settings.language = 'en';

await session.start();

// Push audio
await session.append(pcmBytes);

// Read results
for await (const result of session.getStream()) {
    console.log(result.content[0].text);       // transcribed text
    console.log(result.is_final);              // true for final results
}

await session.stop();

πŸ“– JavaScript live audio transcription sample

C#
var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();
session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio
await session.AppendAsync(pcmBytes);

// Read results
await foreach (var result in session.GetStream())
{
    Console.WriteLine(result.Content[0].Text);       // transcribed text
    Console.WriteLine(result.IsFinal);               // true for final results
}

await session.StopAsync();

πŸ“– C# live audio transcription sample

Rust
let audio_client = model.create_audio_client();
let session = audio_client.create_live_transcription_session();
session.start(None).await?;

// Push audio
session.append(&pcm_bytes).await?;

// Read results
let mut stream = session.get_stream().await?;
while let Some(result) = stream.next().await {
    let r = result?;
    if let Some(content) = r.content.first() {
        println!("{}", content.text);       // transcribed text
        println!("{}", r.is_final);         // true for final results
    }
}

session.stop().await?;

πŸ“– Rust live audio transcription sample


πŸ“ Embeddings

Generate text embeddings entirely on-device for semantic search, RAG, clustering, and more. The new qwen3-0.6b-embedding model delivers high-quality vector representations in a compact footprint.

Python
model = manager.catalog.get_model("qwen3-0.6b-embedding")
model.download()
model.load()

client = model.get_embedding_client()

# Single embedding
response = client.generate_embedding("The quick brown fox jumps over the lazy dog")
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

# Batch embeddings
batch_response = client.generate_embeddings([
    "Machine learning is a subset of artificial intelligence",
    "The capital of France is Paris",
    "Rust is a systems programming language",
])

πŸ“– Python embeddings sample

JavaScript
const model = await manager.catalog.getModel('qwen3-0.6b-embedding');
await model.download();
await model.load();

const embeddingClient = model.createEmbeddingClient();

// Single embedding
const response = await embeddingClient.generateEmbedding(
    'The quick brown fox jumps over the lazy dog'
);
console.log(`Dimensions: ${response.data[0].embedding.length}`);

// Batch embeddings
const batchResponse = await embeddingClient.generateEmbeddings([
    'Machine learning is a subset of artificial intelligence',
    'The capital of France is Paris',
    'Rust is a systems programming language'
]);

πŸ“– JavaScript embeddings sample

C#
var model = await catalog.GetModelAsync("qwen3-0.6b-embedding");
await model.DownloadAsync();
await model.LoadAsync();

var embeddingClient = await model.GetEmbeddingClientAsync();

// Single embedding
var response = await embeddingClient.GenerateEmbeddingAsync(
    "The quick brown fox jumps over the lazy dog");
var embedding = response.Data[0].Embedding;
Console.WriteLine($"Dimensions: {embedding.Count}");

// Batch embeddings
var batchResponse = await embeddingClient.GenerateEmbeddingsAsync([
    "Machine learning is a subset of artificial intelligence",
    "The capital of France is Paris",
    "Rust is a systems programming language"
]);

πŸ“– C# embeddings sample

Rust

πŸ“– Rust embeddings sample


πŸ‘οΈ Qwen 3.5 Vision Language Model

Introducing Qwen 3.5 VL β€” a multimodal vision-language model that runs entirely on-device. Analyze images, understand visual content, and answer questions about what's in a picture β€” all without sending data to the cloud.

model = manager.catalog.get_model("qwen3.5-vision")
model.download()
model.load()

πŸ“¦ JavaScript SDK β€” Koffi Dependency Removed

The JavaScript SDK no longer depends on koffi for native interop. This results in a leaner dependency tree, faster installs, and fewer compatibility issues across platforms and Node.js versions.

  • βœ… Smaller node_modules β€” no more large native FFI dependency
  • βœ… Fewer platform quirks β€” prebuilt N-API addon replaces runtime FFI binding
  • βœ… Faster install times β€” less to download, nothing to compile

πŸ“– JavaScript SDK Documentation


πŸ–₯️ WebGPU Plugin Execution Provider

The new WebGPU execution provider is delivered as a plug-in β€” it's not bundled with the core runtime, keeping your base binary small. When WebGPU acceleration is needed, Foundry Local automatically downloads and registers the EP on the fly, so your users only pay the size cost if their hardware benefits from it.

This approach means:

  • βœ… Smaller default install β€” the core package stays lean (~20 MB)
  • βœ… On-demand download β€” the WebGPU EP is fetched and registered at runtime only when needed
  • βœ… Broader GPU coverage β€” unlocks hardware acceleration through the WebGPU Execution Provider for our cross-platform solution

Please ensure you've run the necessary download and register EPs function to enable WebGPU EP

Python
# Discover available EPs
eps = manager.discover_eps()
for ep in eps:
    print(f"  {ep.name} (registered: {ep.is_registered})")

# Download and register all EPs with progress
current_ep = ""

def on_progress(ep_name: str, percent: float) -> None:
    global current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name}  {percent:5.1f}%", end="", flush=True)

result = manager.download_and_register_eps(progress_callback=on_progress)
print()
print(f"Success: {result.success}, Status: {result.status}")
JavaScript
// Discover available EPs
const eps = await manager.discoverEps();
for (const ep of eps) {
    console.log(`  ${ep.name} (registered: ${ep.isRegistered})`);
}

// Download and register all EPs with progress
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') {
            process.stdout.write('\n');
        }
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName}  ${percent.toFixed(1)}%`);
});
process.stdout.write('\n');
C#
// Discover available EPs
var eps = await mgr.DiscoverEpsAsync();
foreach (var ep in eps)
{
    Console.WriteLine($"  {ep.Name} (registered: {ep.IsRegistered})");
}

// Download and register all EPs with progress
string currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "")
        {
            Console.WriteLine();
        }
        currentEp = epName;
    }
    Console.Write($"\r  {epName}  {percent,6:F1}%");
});
Console.WriteLine();
Rust
// Discover available EPs
let eps = manager.discover_eps().await?;
for ep in &eps {
    println!("  {} (registered: {})", ep.name, ep.is_registered);
}

// Download and register all EPs with progress
use std::sync::{Arc, Mutex};

let current_ep = Arc::new(Mutex::new(String::new()));
let ep = Arc::clone(&current_ep);
manager.download_and_register_eps_with_progress(None, move |ep_name: &str, percent: f64| {
    let mut current = ep.lock().unwrap();
    if ep_name != current.as_str() {
        if !current.is_empty() {
            println!();
        }
        *current = ep_name.to_string();
    }
    print!("\r  {}  {:5.1}%", ep_name, percent);
}).await?;
println!();

πŸ“š Resources

Resource Link
πŸ“– MSLearn Docs learn.microsoft.com/en-us/azure/foundry-local/get-started
πŸ™ GitHub github.com/microsoft/Foundry-Local
πŸ§ͺ Samples samples/

πŸ’™ Thank You

Thank you to our community for your feedback and contributions!