feat(models): add WebLLM model provider for on-device browser inference by jsamuel1 · Pull Request #1036 · strands-agents/sdk-typescript

jsamuel1 · 2026-05-09T06:11:47Z

Motivation

WebLLM runs quantized LLMs entirely in the browser via WebGPU, with model weights cached in IndexedDB/CacheStorage after the first download. Without a first-class provider, users building browser-based agents have to wire up @mlc-ai/web-llm themselves or reach for the community webllm-ai-provider via VercelModel (0.0.1, ~2 weekly downloads on npm).

This adds a WebLLMModel provider under @strands-agents/sdk/models/webllm so on-device, offline-capable agents are a one-import experience — matching how BedrockModel, AnthropicModel, etc. are shipped today.

Resolves strands-agents/harness-sdk#2481

Public API Changes

New subpath export @strands-agents/sdk/models/webllm with a WebLLMModel class and cache-management helpers.

import { Agent } from '@strands-agents/sdk'
import { WebLLMModel } from '@strands-agents/sdk/models/webllm'

const agent = new Agent({
  model: new WebLLMModel({
    modelId: 'Llama-3.1-8B-Instruct-q4f32_1-MLC',
    onProgress: (report) => console.log(report.text, report.progress),
  }),
})

const result = await agent.invoke('Hello!')

Cache helpers let apps pre-download from a settings UI, check what's cached, and evict models independently of an agent invocation:

import {
  downloadWebLLMModel,
  isWebLLMModelCached,
  deleteWebLLMModel,
  listWebLLMModels,
} from '@strands-agents/sdk/models/webllm'

if (!(await isWebLLMModelCached('Phi-3.5-mini-instruct-q4f16_1-MLC'))) {
  await downloadWebLLMModel({
    modelId: 'Phi-3.5-mini-instruct-q4f16_1-MLC',
    onProgress: (r) => updateProgressBar(r.progress, r.text),
  })
}

@mlc-ai/web-llm is declared as an optional peerDependency, so server-side users are unaffected. Attempting to use the provider outside a browser or without the peer installed raises a typed WebLLMUnavailableError.

Use Cases

Fully offline browser agents after the initial model download
Privacy-sensitive deployments where prompts/responses must not leave the device
Zero per-call cost — inference runs on user hardware
Demo/education apps with no cloud credentials required

Testing

strands-ts/src/models/webllm/__tests__/model.test.ts — unit tests for streaming/formatting/tool-use paths with a mocked MLCEngine
strands-ts/src/models/webllm/__tests__/cache.test.node.ts — Node-side environment guards and error surfaces
strands-ts/src/models/webllm/__tests__/browser.test.browser.ts — browser smoke test
strands-ts/test/packages/{esm-module,cjs-module} — subpath export resolution for the new ./models/webllm entry

All existing suites pass (2554 passed) alongside the new coverage.

Notes

Marked as draft until @mlc-ai/web-llm peer-dep wiring is sanity-checked in CI and the browser-integration job lights up end-to-end.
AGENTS.md directory map updated to reflect the new webllm/ module.

Adds a new WebLLMModel provider under @strands-agents/sdk/models/webllm that runs quantized LLMs entirely in the browser via WebGPU using @mlc-ai/web-llm. Models are cached in browser storage after the first download. Includes cache management helpers (downloadWebLLMModel, isWebLLMModelCached, deleteWebLLMModel, listWebLLMModels) so apps can pre-download models from a settings UI and report progress via an onProgress callback. @mlc-ai/web-llm is added as an optional peer dependency to keep it out of the default dependency graph for server-side users. Resolves #1035

github-actions · 2026-05-09T06:17:32Z

+ *
+ * @throws {@link WebLLMUnavailableError} when WebLLM cannot be loaded.
+ */
+export async function listWebLLMModels(appConfig?: AppConfig): Promise<WebLLMModelInfo[]> {


Issue: listWebLLMModels does not call assertBrowserEnvironment() unlike isWebLLMModelCached, deleteWebLLMModel, and downloadWebLLMModel. This is inconsistent — if the module can't be loaded in Node, it will throw WebLLMUnavailableError from loadWebLLMModule() anyway, but the error message won't be the clear "requires a browser" guidance.

Suggestion: Either add assertBrowserEnvironment() for consistency with the other helpers, or add a code comment explaining why listWebLLMModels intentionally skips the check (e.g., if it's designed to work in server-side contexts for listing available models without needing WebGPU).

github-actions · 2026-05-09T06:17:33Z

+  }
+
+  return events
+}


Issue: The mapChunkToEvents, extractUsage, and the streaming state management logic here are nearly line-for-line identical to mapChatChunkToEvents in src/models/openai/chat-adapter.ts. This creates a maintenance burden where fixes to one must be duplicated to the other.

Suggestion: Consider extracting the shared OpenAI-compatible chunk-to-event mapping into a shared utility (e.g. src/models/openai-compatible-streaming.ts) that both the OpenAI chat adapter and WebLLM can import. At minimum, leave a // NOTE: comment cross-referencing the OpenAI adapter so future maintainers know to keep them in sync.

github-actions · 2026-05-09T06:17:35Z

+    modelLib: record.model_lib,
+  }
+  if (record.vram_required_MB !== undefined) info.vramMB = record.vram_required_MB
+  if (typeof (record as unknown as { model_type?: string }).model_type === 'string') {


Issue: The model_type access uses a double cast through unknown ((record as unknown as { model_type?: string }).model_type), which is fragile and circumvents type safety.

Suggestion: Since ModelRecord comes from @mlc-ai/web-llm types, either:

Use optional chaining with an in check: if ('model_type' in record && typeof record.model_type === 'string')

Or extend the ModelRecord type locally if this field is expected but not yet typed upstream

The current double-cast could silently break if model_type is renamed or restructured.

github-actions · 2026-05-09T06:17:36Z

+
+      if (bufferedUsage) yield bufferedUsage
+      if (bufferedStop) yield bufferedStop
+    } catch (error) {


Issue: The stream method catches errors from the engine and re-throws via normalizeError(error), but if the error occurs during iteration of the async iterable (inside for await), the generator will be in a partially-yielded state. The consumer will see the error, but any buffered modelContentBlockStartEvent won't have a matching modelContentBlockStopEvent, which could leave the SDK's message accumulator in an inconsistent state.

Suggestion: Consider emitting content block stop events in the catch/finally block when state.textContentBlockStarted is true or activeToolCalls is non-empty, to ensure the stream is always well-formed even on errors.

github-actions · 2026-05-09T06:17:38Z

    "@aws-sdk/client-s3": "^3.943.0",
    "@google/genai": "^1.40.0",
    "@modelcontextprotocol/sdk": "^1.25.2",
+    "@mlc-ai/web-llm": "^0.2.79",


Issue: The peer dependency is specified as "^0.2.79" which for a pre-1.0 package (semver treats 0.x specially) only allows 0.2.x patches. This is correctly conservative. However, @mlc-ai/web-llm has a history of frequent breaking changes within minor versions (their API changed between 0.2.x releases).

Suggestion: Consider whether pinning more tightly (e.g. ~0.2.79 or exact 0.2.79) would be safer, or alternatively document in the module TSDoc which web-llm API surface you depend on. If the intent is to support a range, add a comment in package.json or the README noting the tested/verified version range.

github-actions · 2026-05-09T06:17:40Z

+    events.push({ type: 'modelMessageStartEvent', role: delta.role as 'user' | 'assistant' })
+  }
+
+  if (delta?.content && delta.content.length > 0) {


Issue: If the stream starts emitting content deltas without a preceding role delta (e.g. some engines skip the role chunk), no modelMessageStartEvent is ever emitted, but content block events are still produced. This would leave the SDK's stream consumer in an inconsistent state.

Suggestion: Add a guard that emits a synthetic modelMessageStartEvent with role: 'assistant' when content arrives before a role delta, similar to how the text content block start is auto-emitted:

if (delta?.content && delta.content.length > 0) { if (!state.messageStarted) { state.messageStarted = true events.push({ type: 'modelMessageStartEvent', role: 'assistant' }) } // ... }

github-actions · 2026-05-09T06:17:41Z

+    return this._enginePromise
+  }
+
+  private async _createEngine(): Promise<MLCEngineInterface> {


Issue: The _createEngine method calls assertBrowserEnvironment() synchronously, then loadWebLLMModule() which also surfaces an environment error. However, when _getEngine() is called, it caches the promise — if the first call fails (e.g., module not found), it correctly resets _enginePromise allowing retry. But assertBrowserEnvironment() will always throw synchronously in Node, meaning the retry logic is unreachable in that scenario. This is fine but worth noting that the catch reset on line 300 only helps for transient loadWebLLMModule failures, not environment failures.

No action required — just noting for clarity that the retry semantics only apply to module loading/engine init failures in a valid browser environment.

github-actions · 2026-05-09T06:17:43Z

+ *
+ * @internal
+ */
+export function assertBrowserEnvironment(): void {


Issue: assertBrowserEnvironment() checks typeof window === 'undefined' to detect non-browser environments. However, some server-side runtimes (Cloudflare Workers, Deno Deploy) and test environments (jsdom) define window without actually having WebGPU. Conversely, Web Workers (where WebGPU is available) don't have window.

Suggestion: Consider checking for typeof navigator !== 'undefined' && 'gpu' in navigator (or at minimum typeof globalThis.navigator !== 'undefined') as a more accurate browser+WebGPU heuristic, or simply let the CreateMLCEngine call surface WebLLM's own environment check (which it already does) and remove the preemptive check. The error message could also mention Web Workers as a valid environment.

github-actions · 2026-05-09T06:17:44Z

Review Summary

Assessment: Comment (Draft PR - not blocking, providing feedback for iteration)

This is a well-structured addition that follows existing model provider patterns closely. The code is clean, well-documented, and thoroughly tested.

Review Categories

Code Duplication: The OpenAI-compatible streaming logic (mapChunkToEvents, state management, usage extraction) is nearly identical to openai/chat-adapter.ts. This is the most impactful improvement opportunity — extracting a shared utility would reduce future maintenance burden across both providers.
Robustness: Edge cases around missing role deltas and mid-stream errors could leave the SDK's stream consumer in an inconsistent state. Adding guards for well-formed event sequences would improve reliability.
Environment Detection: The assertBrowserEnvironment() check is overly simplistic (window detection) and would incorrectly reject valid environments (Web Workers) while accepting invalid ones (jsdom). Consider relying on WebLLM's own runtime checks.
API Review Process: This introduces a new public class with cache management helpers — per the API Bar Raising guidelines, it should carry the needs-api-review label for designated reviewer evaluation before merge.

Good work on the overall design — the cache helper separation, abort signal support, and consistent error class hierarchy are thoughtful touches.

strands-agent · 2026-06-02T19:21:37Z

This repository has been merged into the strands-agents/harness-sdk monorepo and will be archived shortly. All new development happens there.

If this PR is still relevant, please recreate it against the monorepo. The code now lives under strands-ts/. Full commit history was preserved, so your base should be findable.

Apologies for the disruption, and thank you for contributing!

jsamuel1 temporarily deployed to auto-approve May 9, 2026 06:11 — with GitHub Actions Inactive

jsamuel1 had a problem deploying to auto-approve May 9, 2026 06:12 — with GitHub Actions Failure

github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 9, 2026

github-actions Bot reviewed May 9, 2026

View reviewed changes

github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): add WebLLM model provider for on-device browser inference#1036

feat(models): add WebLLM model provider for on-device browser inference#1036
jsamuel1 wants to merge 1 commit into
strands-agents:mainfrom
jsamuel1:feat/webllm-model-provider

jsamuel1 commented May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

strands-agent commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsamuel1 commented May 9, 2026

Motivation

Public API Changes

Use Cases

Testing

Notes

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 9, 2026

Review Summary

Uh oh!

strands-agent commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants