Skip to content

Commit fcf2487

Browse files
committed
Fix context
1 parent 797644d commit fcf2487

File tree

3 files changed

+49
-2
lines changed

3 files changed

+49
-2
lines changed

examples/demo-compose-app/app/src/androidMain/kotlin/com/jetbrains/example/kotlin_agents_demo_app/MainActivity.kt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ class MainActivity : ComponentActivity() {
1717
KoinApp(
1818
executor = MultiLLMPromptExecutor(
1919
AndroidLocalLLMProvider to AndroidLLocalLLMClient(
20-
TODO("How and where to get it?!"),
20+
this,
2121
"/data/local/tmp/llm"
2222
)
2323
),
24-
model = AndroidLocalModels.Chat.Llama
24+
model = AndroidLocalModels.Chat.Gemma
2525
)
2626
}
2727
}
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Google QA
2+
* How to supply models to the device and discover ()
3+
* What is the status of the models with tool calling (support)
4+
* Why we need delay (how many) delay(settings.closeSessionDelay)
5+
* How to check is GPU busy or not, how many, how many models in parallel (tooling for it), model in GPU availability & resources, can we share state of the models across the apps
6+
7+
# Local models
8+
LiteRT is TF for on-device model inferencing https://github.com/google-ai-edge/LiteRT (C++ runtime library).
9+
To use the framework, the model should be processed to the LiteRT format.
10+
Here is the HF repo of already transformed models https://huggingface.co/litert-community
11+
The problem is the only one model which support tool calling is Hummer https://huggingface.co/litert-community/Hammer2.1-1.5b
12+
13+
Android has a library to work with LiteRT from java (using NJI)
14+
* https://github.com/google-ai-edge/mediapipe
15+
* https://github.com/google-ai-edge/mediapipe/tree/master/mediapipe/tasks/java/com/google/mediapipe/tasks/genai/llminference
16+
And the documentation how to use it
17+
* https://ai.google.dev/edge/mediapipe/solutions/guide
18+
* https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/android
19+
20+
For tool calling local agents extension is used
21+
https://github.com/google-ai-edge/ai-edge-apis/tree/main/local_agents
22+
https://ai.google.dev/edge/mediapipe/solutions/genai/function_calling/android
23+
24+
# Model
25+
Gemma is the Google on-device model https://huggingface.co/litert-community/Gemma3-1B-IT
26+
Gemma3-1B-(IT)_(multi-prefill-seq)_(seq128)_(f32|q4|q8)_(ekv1280|ekv2048|ekv4096)_(block32|block128).task
27+
* Gemma → Model family (Gemma).
28+
* 3 → Likely version (v3).
29+
* 1B → Parameter size = ~1 billion parameters.
30+
* IT → Instruction-Tuned (fine-tuned for following instructions).
31+
* multi-prefill → Likely optimized for multiple prefill sequences in parallel.
32+
* seq128 → Maximum sequence length (128 tokens).
33+
* quantization
34+
* f32 → 32-bit floating point precision (full precision).
35+
* q4 → Quantized to 4-bit integers.
36+
* q8 → Quantized to 8-bit integers.
37+
* block32, block128 → Quantization block size.
38+
* ekv1280, ekv2048, ekv4096 → Likely means effective KV cache size (number of key/value slots in the attention cache).
39+
* .task → Model or runtime task file (probably used by the training/inference pipeline).
40+
* .litertlm → LiteRT Language Model format (optimized model for LiteRT runtime, smaller and faster).
41+
* Chipset identifiers
42+
* mtXXXX → MediaTek processors.
43+
* smXXXX → Qualcomm Snapdragon processors.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
adb shell rm -r /data/local/tmp/llm/
2+
adb shell mkdir -p /data/local/tmp/llm/
3+
export MODEL_NAME="Gemma3-1B-IT_multi-prefill-seq_q4_ekv2048"
4+
adb push ${PWD}/${MODEL_NAME}.task /data/local/tmp/llm/${MODEL_NAME}.task

0 commit comments

Comments
 (0)