Skip to content

feat: support add gemma4 e2b ondevice model#100

Merged
zhu-xiaowei merged 13 commits into
mainfrom
gemma4
Jun 12, 2026
Merged

feat: support add gemma4 e2b ondevice model#100
zhu-xiaowei merged 13 commits into
mainfrom
gemma4

Conversation

@zhu-xiaowei

Copy link
Copy Markdown
Contributor

Description

  1. support download and use gemma4 E2B on device model
  2. support agent prompt for gemma4 and add factory inspect demo agent
  3. support image understanding

General Checklist

  • The code changes have been fully tested
  • Security oriented best practices and standards are followed (e.g. using input sanitization, principle of the least privilege, etc.)
  • Documentation update for the change if required
  • PR title conforms to conventional commit style
  • If breaking change, documentation/changelog update with migration instructions

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.

zhu-xiaowei and others added 13 commits May 25, 2026 09:58
Add LiteRT-LM framework via local SPM package (workaround for unsafeFlags
restriction in Xcode 26) with native Swift module bridging to React Native.
Supports streaming text generation, multi-turn conversation, system prompts,
and background engine pre-loading on model selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable multimodal inference with visionBackend, accept image file paths
from RN layer (base64 written to temp files), and remove unused litert.png
placeholder icon.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Download Gemma 4 E2B from HuggingFace with progress bar and speed
- Support background download and resume display on modal reopen
- Fix crash on JS reload by ordering conversation/engine cleanup in deinit
- Fix NSNull error when no image paths provided

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables speculative decoding via ExperimentalFlags for significantly
faster decode speed on real devices with Metal GPU backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… E2B

Demonstrate on-device agent capability via LiteRT-LM tool calling:
- RecordFindingTool for multi-dimension quality inspection (text/damage/alignment)
- FactoryInspect system prompt drives the model to call the tool per dimension
- Node-style timeline UI (InspectionNodeView) shows each tool call result with
  green/red status dots and a connecting line, reusing the markdown renderer
- Final verdict streams in after all tool calls complete
- Allow sending image-only messages so the inspection prompt name fills the text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make the inspection agent fully configurable via system prompts instead of
hardcoding the FactoryInspect scenario:
- Trigger agent mode by prompt.isAgent flag instead of a fixed name
- Generic recordFinding tool (loadSkill indirection removed); stepName is
  defined by the prompt, so new scenarios need zero native code
- Drop hardcoded 3-step assumption; stream final summary after any tool call
- Add InvoiceCheck as a second example agent prompt
- Hide on-device agent prompts and the Gemma model on Android and until the
  model is downloaded (isLiteRTModelReady flag, synced on startup)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stale cached FactoryInspect prompts lacked the isAgent flag, so the chat
fell back to plain messaging (no tools, no node timeline). Built-in agent
prompts are demos, so always refresh them to the latest code version on
load instead of only adding when missing. Also remove the InvoiceCheck
example, keeping FactoryInspect as the single demo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Let users mark a custom system prompt as an on-device agent (iOS only,
shown when the Gemma model is downloaded and not in voice/image mode).
Enabling it shows a hint to attach an image and instruct the model to call
the recordFinding tool. Custom agent prompts are also hidden until the
model is downloaded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ses/Messages APIs

Route models served on the bedrock-mantle engine through their native APIs
(OpenAI Responses for GPT-5.x, OpenAI Chat Completions for open-weight models,
Anthropic Messages for Claude Fable 5/Mythos 5), falling back to the legacy
Converse API for everything else. Works in both Bedrock API Key mode (Bearer
token, client-direct) and SwiftChat Server mode (server signs with SigV4 IAM).

Model lists are merged dynamically per region from mantle GET /v1/models, so
unavailable models simply do not appear. Drop the standalone official OpenAI
provider (api.openai.com); OpenAI models are now served on Bedrock. The
OpenAI-Compatible custom endpoints feature is preserved under the OpenAI tab.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Strip the cross-region profile prefix (us./eu./global.) for the mantle
  Messages route, which only accepts the bare foundation-model id (unlike
  Converse). Fixes Fable 5 returning empty.
- Mark the message complete on output-finished (output_text.done /
  message_delta) instead of waiting for the terminal completed event, which
  mantle can delay tens of seconds; keep reading so usage still lands.
- Surface non-SSE error envelopes (bare JSON with no data: line) and flush any
  trailing buffer at stream end.
- Add mantle.py to the Dockerfile COPY list (was missing, broke the Lambda).
- Grant bedrock-mantle IAM permissions to the Lambda execution role and the
  client access role in the CloudFormation template.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace any types on renderer/tokenizer with MarkdownProps types, drop the
unused isStreaming destructure, and move inline styles into the StyleSheet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zhu-xiaowei zhu-xiaowei merged commit de4203e into main Jun 12, 2026
6 checks passed
@zhu-xiaowei zhu-xiaowei deleted the gemma4 branch June 12, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant