feat: support add gemma4 e2b ondevice model#100
Merged
Merged
Conversation
Add LiteRT-LM framework via local SPM package (workaround for unsafeFlags restriction in Xcode 26) with native Swift module bridging to React Native. Supports streaming text generation, multi-turn conversation, system prompts, and background engine pre-loading on model selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable multimodal inference with visionBackend, accept image file paths from RN layer (base64 written to temp files), and remove unused litert.png placeholder icon. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Download Gemma 4 E2B from HuggingFace with progress bar and speed - Support background download and resume display on modal reopen - Fix crash on JS reload by ordering conversation/engine cleanup in deinit - Fix NSNull error when no image paths provided Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables speculative decoding via ExperimentalFlags for significantly faster decode speed on real devices with Metal GPU backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… E2B Demonstrate on-device agent capability via LiteRT-LM tool calling: - RecordFindingTool for multi-dimension quality inspection (text/damage/alignment) - FactoryInspect system prompt drives the model to call the tool per dimension - Node-style timeline UI (InspectionNodeView) shows each tool call result with green/red status dots and a connecting line, reusing the markdown renderer - Final verdict streams in after all tool calls complete - Allow sending image-only messages so the inspection prompt name fills the text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make the inspection agent fully configurable via system prompts instead of hardcoding the FactoryInspect scenario: - Trigger agent mode by prompt.isAgent flag instead of a fixed name - Generic recordFinding tool (loadSkill indirection removed); stepName is defined by the prompt, so new scenarios need zero native code - Drop hardcoded 3-step assumption; stream final summary after any tool call - Add InvoiceCheck as a second example agent prompt - Hide on-device agent prompts and the Gemma model on Android and until the model is downloaded (isLiteRTModelReady flag, synced on startup) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stale cached FactoryInspect prompts lacked the isAgent flag, so the chat fell back to plain messaging (no tools, no node timeline). Built-in agent prompts are demos, so always refresh them to the latest code version on load instead of only adding when missing. Also remove the InvoiceCheck example, keeping FactoryInspect as the single demo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Let users mark a custom system prompt as an on-device agent (iOS only, shown when the Gemma model is downloaded and not in voice/image mode). Enabling it shows a hint to attach an image and instruct the model to call the recordFinding tool. Custom agent prompts are also hidden until the model is downloaded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ses/Messages APIs Route models served on the bedrock-mantle engine through their native APIs (OpenAI Responses for GPT-5.x, OpenAI Chat Completions for open-weight models, Anthropic Messages for Claude Fable 5/Mythos 5), falling back to the legacy Converse API for everything else. Works in both Bedrock API Key mode (Bearer token, client-direct) and SwiftChat Server mode (server signs with SigV4 IAM). Model lists are merged dynamically per region from mantle GET /v1/models, so unavailable models simply do not appear. Drop the standalone official OpenAI provider (api.openai.com); OpenAI models are now served on Bedrock. The OpenAI-Compatible custom endpoints feature is preserved under the OpenAI tab. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Strip the cross-region profile prefix (us./eu./global.) for the mantle Messages route, which only accepts the bare foundation-model id (unlike Converse). Fixes Fable 5 returning empty. - Mark the message complete on output-finished (output_text.done / message_delta) instead of waiting for the terminal completed event, which mantle can delay tens of seconds; keep reading so usage still lands. - Surface non-SSE error envelopes (bare JSON with no data: line) and flush any trailing buffer at stream end. - Add mantle.py to the Dockerfile COPY list (was missing, broke the Lambda). - Grant bedrock-mantle IAM permissions to the Lambda execution role and the client access role in the CloudFormation template. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace any types on renderer/tokenizer with MarkdownProps types, drop the unused isStreaming destructure, and move inline styles into the StyleSheet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
General Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.