Releases: kantv-ai/kantv
kantv-1.6.10
Changes
All Features
- TV playback
- TV playback + TV recording
- local playback with recorded video file
- TV playback + AI-subtitle(powered by whisper.cpp)
- TV playback + TV recording + AI-subtitle
- ASR inference(through internal customized whisper.cpp) benchmark with ggml backend & Hexagon-cDSP backend
- LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
- multi-modal(image-2-text) LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
- text-2-image(through internal customized stable-diffusion.cpp) inference with ggml backend
- 2D graphic benchmark
- video encode benchmark
- video encode benchmark and create code-generated video file
- local playback with code-generated video file
- download LLM model(in local dev envs)
- edit tv.xml(aka customize tv.xml for personal need or R&D activity)
- realtime video recognition via MTMD in upstream llama.cpp + SmolVLM2-256M with the default ggml backend
Prebuilt Android APK for Qualcomm Snapdragon based phone and non qcom based phone
-
download prebuilt Android APK from
https://github.com/kantv-ai/kantv/actions/runs/15226969178 -
unzip the downloaded file
-
install the corresponding APK on the Android phone equipped with / without Qualcomm mobile SoC
Dev envs
internal customized/tailored QNN SDK and Hexagon SDK, official Android NDK, all of these can be automatically downloadded and setup by a hand-written script in this project, details can be seen in how-to-build.md.
official Android Studio Jellyfish (| 2023.3.1 April 30, 2024), Android Studio can be skipped for AI researchers/experts(vim + vscode is more lightweight tools), details can be seen in how-to-build.md.
QNN SDK is v2.34.0.250424,Hexagon SDK is v6.2.0.1, Android NDK is android-ndk-r28.
Running on Android phone
-
Android 7.0(2016.08) --- Android 15(2024.10) and higher version with ANY mainstream arm64 mobile SoC should be supported, issue reports are welcomed.
-
Android smartphone equipped with one of below Qualcomm mobile SoCs(Qualcomm Snapdragon 8Gen3 and 8Elite are highly recommended) is required for verify/running ggml-hexagon backend on Android phone:
Snapdragon 8 Gen 1
Snapdragon 8 Gen 1+
Snapdragon 8 Gen 2
Snapdragon 8 Gen 3(verified)
Snapdragon 8 Elite(verified)
- Android smartphone equipped with ANY mainstream high-end mobile SoC is highly recommended for realtime AI-subtitle feature otherwise unexpected behavior would happen
AI models
default ASR model is the built-in whisper model gml-tiny.en-q8_0.bin, built-in in the APK.
default LLM model is Gemma3-4B, best overall performance on my Snapdragon 8Gen3 phone and 8Elite phone.
default Multimodal model is SmolVLM2-256M
default Text2Image model is sd-v1.4
verified LLM models: Qwen1.5-1.8B,Qwen2.5-3b,Qwen3-4B,Qwen3-8B,Qwen3-14B,Gemma3-4B,Gemma3-12B
verified Multimodal models:SmolVLM2-256M,SmolVLM-500M
verified Text2Image models: sd-v1.4
all these AI models can be downloaded manually in "LLM Setting"(directly access to https://huggingface.co/ is required for download AI models in the APK):

Todo
- an automated CT approach should be introduced in this project for validate every PR/release. whether AI can be used for this purpose?
- solve the deadlock in ndkcamera:nihui/ncnn-android-scrfd#16
- remove redundant and unnecessary permissions(the permissions of send and receive SMS are actually not used/needed in the APK) of APK to strictly follow the principle of "minimum permissions' and EU's GDPR.
- sync with upstream llama.cpp&whisper.cpp
- integrate the new MTMD(audio supportive)in the upstream llama.cpp
- implement a specified feature(download AI models in the APK directly) for users/developers from China.
- others can be found at roadmap
kantv-1.6.9
Overview
- text-2-image for automatically generate poster for user's customized online-TV program
- integrate stablediffusion.cpp for text-2-image on Android phone
- make stable-diffusion inference can work(not correct) on Hexagon-cDSP with ggml-hexagon
- enable another playback engine --- a customized Google Exoplayer2.15.1
- decoupling UI and data, refine UI and logic of download LLM models in APK
- refine JNI and try to improve stability of llava inference and stablediffusion inference in some corner cases
- refine docs
- refine ggml-hexagon.cpp and sync to the PR in upstream llama.cpp community
- improve stability
Status
test model is Gemma3-4B.
validate following features on Android phone equipped with Snapdragon 8Gen3 and Snapdragon 8Elite:
- TV playback
- TV playback + TV recording
- local playback with recorded video file
- TV playback + AI-subtitle
- TV playback + TV recording + AI-subtitle
- ASR inference(through whisper.cpp) benchmark with ggml backend & cDSP backend
- LLM inference(through llama.cpp) benchmark with ggml backend & cDSP backend
- multi-modal(image-2-text) LLM inference(through llama.cpp) benchmark with ggml backend & cDSP backend
- text-2-image inference with ggml backend
- 2D graphic benchmark
- video encode benchmark
- video encode benchmark and create code-generated video file
- local playback with code-generated video file
- download LLM model(in local dev envs)
- edit tv.xml(aka customize tv.xml for personal need or R&D activity)
Dev envs
QNN SDK is v2.33.0.250327,Hexagon SDK is v6.2.0.1, Android NDK is android-ndk-r28.
QNN SDK and Android NDK can be downloaded automatically through build-run-android.sh, Hexagon SDK must be obtained with a Qualcomm Developer Account.
Running on Android phone
-
Android 5.1 --- Android 15 or higher version might-be/should-be/could-be supported.
-
Android smartphone equipped with one of below Qualcomm mobile SoCs(Qualcomm Snapdragon 8Gen3 and Snapdragon 8Elite are highly recommended) is required for verify/running ggml-hexagon backend on Android phone:
Snapdragon 8 Gen 1
Snapdragon 8 Gen 1+
Snapdragon 8 Gen 2
Snapdragon 8 Gen 3
Snapdragon 8 Elite
- Android smartphone equipped with ANY mainstream high-end mobile SoC is highly recommended for realtime AI-subtitle feature otherwise unexpected behavior would happen
Todo
- an automated CT approach should be introduced in this project for validate every PR/release. whether AI can be used for this purpose?
- there is unsolved issue in UI(corner case would lead to APP crash)
- this is debug build rather than release build(this is a workaround approach for a known issue)
- others can be found at roadmap
kantv-1.6.8
Overview
- sync ggml-hexagon implementation from https://github.com/zhouwg/ggml-hexagon
- remove dependency of QNN runtime libs in APK and reduce the size of APK significantly.
- make HWACCEL_CDSP approach works fine as expected on Android phone equipped with Qualcomm high-end mobile SoC
- maintain only one version of ggml-hexagon.cpp and make work flow more clear and easy
- refine the entire project
- refine codes in JNI and UI
- sync llama.cpp with upstream llama.cpp project
- upgrade QNN SDK to v2.33.0.250327
- upgrade Android NDK to android-ndk-r28
- fix a very long-term issue of "2D graphic benchmark does not work properly on Android phone": #163
- fix a stability issue of "AI-subtitle can't works" which introduced in #281 . this is unacceptable
- try Qwen3-4B, Qwen3-8B, DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, Gemma3-4B, Gemma3-12B on Android phone.
- make DeepSeek-R1-Distill-Qwen-1.5B can works fine on Android phone
- multi-modal(image-2-text) inference supportive through Gemma3 on Android phone
- add LLM setting on Android phone
- add download LLM models in APK on Android phone(this feature need to access HuggingFace directly otherwise it's not available)
- make tv.xml can be edited by APK's user whom doesn't have tech background.
- refine and simplify UI code
- improve stability
Status
test model is Gemma3-4B.
validate following features on Android phone equipped with Snapdragon 8Gen3 and Snapdragon 8Elite:
- TV playback
- TV playback + TV recording
- local playback with recorded video file
- TV playback + AI-subtitle
- TV playback + TV recording + AI-subtitle
- ASR inference(through whisper.cpp) benchmark with ggml backend & cDSP backend
- LLM inference(through llama.cpp) benchmark with ggml backend & cDSP backend
- multi-modal(image-2-text) LLM inference(through llama.cpp) benchmark with ggml backend & cDSP backend
- 2D graphic benchmark
- video encode benchmark
- video encode benchmark and create code-generated video file
- local playback with code-generated video file
- download LLM model(in local dev envs)
- edit tv.xml(aka customize tv.xml for personal need or R&D activity)
Dev envs
QNN SDK is v2.33.0.250327,Hexagon SDK is v6.2.0.1, Android NDK is android-ndk-r28.
QNN SDK and Android NDK can be downloaded automatically through build-run-android.sh, Hexagon SDK must be obtained with a Qualcomm Developer Account.
Running on Android phone
-
Android 5.1 --- Android 15 or higher version might-be/should-be/could-be supported.
-
Android smartphone equipped with one of below Qualcomm mobile SoCs(Qualcomm Snapdragon 8Gen3 and Snapdragon 8Elite are highly recommended) is required for verify/running ggml-hexagon backend on Android phone:
Snapdragon 8 Gen 1
Snapdragon 8 Gen 1+
Snapdragon 8 Gen 2
Snapdragon 8 Gen 3
Snapdragon 8 Elite
- Android smartphone equipped with ANY mainstream high-end mobile SoC is highly recommended for realtime AI-subtitle feature otherwise unexpected behavior would happen
Todo
- an automated CT approach should be introduced in this project for validate every PR/release. whether AI can be used for this purpose?
- there is unsolved issue in UI(corner case would lead to APP crash)
- this is debug build rather than release build(this is a workaround approach for a known issue)
- realtime text-2-image inference on Android phone:#301
- others can be found at roadmap