feat: support add gemma4 e2b ondevice model by zhu-xiaowei · Pull Request #100 · aws-samples/sample-mobile-ai-assistant

zhu-xiaowei · 2026-06-11T08:25:04Z

Description

support download and use gemma4 E2B on device model
support agent prompt for gemma4 and add factory inspect demo agent
support image understanding

General Checklist

The code changes have been fully tested
Security oriented best practices and standards are followed (e.g. using input sanitization, principle of the least privilege, etc.)
Documentation update for the change if required
PR title conforms to conventional commit style
If breaking change, documentation/changelog update with migration instructions

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.

Add LiteRT-LM framework via local SPM package (workaround for unsafeFlags restriction in Xcode 26) with native Swift module bridging to React Native. Supports streaming text generation, multi-turn conversation, system prompts, and background engine pre-loading on model selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enable multimodal inference with visionBackend, accept image file paths from RN layer (base64 written to temp files), and remove unused litert.png placeholder icon. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Download Gemma 4 E2B from HuggingFace with progress bar and speed - Support background download and resume display on modal reopen - Fix crash on JS reload by ordering conversation/engine cleanup in deinit - Fix NSNull error when no image paths provided Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enables speculative decoding via ExperimentalFlags for significantly faster decode speed on real devices with Metal GPU backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… E2B Demonstrate on-device agent capability via LiteRT-LM tool calling: - RecordFindingTool for multi-dimension quality inspection (text/damage/alignment) - FactoryInspect system prompt drives the model to call the tool per dimension - Node-style timeline UI (InspectionNodeView) shows each tool call result with green/red status dots and a connecting line, reusing the markdown renderer - Final verdict streams in after all tool calls complete - Allow sending image-only messages so the inspection prompt name fills the text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Make the inspection agent fully configurable via system prompts instead of hardcoding the FactoryInspect scenario: - Trigger agent mode by prompt.isAgent flag instead of a fixed name - Generic recordFinding tool (loadSkill indirection removed); stepName is defined by the prompt, so new scenarios need zero native code - Drop hardcoded 3-step assumption; stream final summary after any tool call - Add InvoiceCheck as a second example agent prompt - Hide on-device agent prompts and the Gemma model on Android and until the model is downloaded (isLiteRTModelReady flag, synced on startup) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stale cached FactoryInspect prompts lacked the isAgent flag, so the chat fell back to plain messaging (no tools, no node timeline). Built-in agent prompts are demos, so always refresh them to the latest code version on load instead of only adding when missing. Also remove the InvoiceCheck example, keeping FactoryInspect as the single demo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Let users mark a custom system prompt as an on-device agent (iOS only, shown when the Gemma model is downloaded and not in voice/image mode). Enabling it shows a hint to attach an image and instruct the model to call the recordFinding tool. Custom agent prompts are also hidden until the model is downloaded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ses/Messages APIs Route models served on the bedrock-mantle engine through their native APIs (OpenAI Responses for GPT-5.x, OpenAI Chat Completions for open-weight models, Anthropic Messages for Claude Fable 5/Mythos 5), falling back to the legacy Converse API for everything else. Works in both Bedrock API Key mode (Bearer token, client-direct) and SwiftChat Server mode (server signs with SigV4 IAM). Model lists are merged dynamically per region from mantle GET /v1/models, so unavailable models simply do not appear. Drop the standalone official OpenAI provider (api.openai.com); OpenAI models are now served on Bedrock. The OpenAI-Compatible custom endpoints feature is preserved under the OpenAI tab. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Strip the cross-region profile prefix (us./eu./global.) for the mantle Messages route, which only accepts the bare foundation-model id (unlike Converse). Fixes Fable 5 returning empty. - Mark the message complete on output-finished (output_text.done / message_delta) instead of waiting for the terminal completed event, which mantle can delay tens of seconds; keep reading so usage still lands. - Surface non-SSE error envelopes (bare JSON with no data: line) and flush any trailing buffer at stream end. - Add mantle.py to the Dockerfile COPY list (was missing, broke the Lambda). - Grant bedrock-mantle IAM permissions to the Lambda execution role and the client access role in the CloudFormation template. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace any types on renderer/tokenizer with MarkdownProps types, drop the unused isStreaming destructure, and move inline styles into the StyleSheet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zhu-xiaowei and others added 13 commits May 25, 2026 09:58

feat: enable multi-token prediction (MTP) for GPU on real devices

0821ceb

Enables speculative decoding via ExperimentalFlags for significantly faster decode speed on real devices with Metal GPU backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: update npm audit issue

83c00da

chore: remove mantle stream debug logging

4ccec05

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix: resolve lint errors in InspectionNodeView

acbcdae

Replace any types on renderer/tokenizer with MarkdownProps types, drop the unused isStreaming destructure, and move inline styles into the StyleSheet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zhu-xiaowei merged commit de4203e into main Jun 12, 2026
6 checks passed

zhu-xiaowei deleted the gemma4 branch June 12, 2026 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support add gemma4 e2b ondevice model#100

feat: support add gemma4 e2b ondevice model#100
zhu-xiaowei merged 13 commits into
mainfrom
gemma4

zhu-xiaowei commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zhu-xiaowei commented Jun 11, 2026

Description

General Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant