Description
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
Android 15
MediaPipe Tasks SDK version
0.10.21
Task name (e.g. Image classification, Gesture recognition etc.)
Face landmark detection
Programming Language and version (e.g. C++, Python, Java)
Kotlin
Describe the actual behavior
The face landmarker model takes about 30-70 ms to run on a Pixel 9 Pro
Describe the expected behaviour
I would expect the model to run in real-time (on Desktop, using Wasm it takes 15-20 ms)
Standalone code/steps you may have used to try to get what you need
I tried the code sample code from https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/face_landmarker/android
Other info / Complete Logs
To check how fast the model can run, I used the TFLite Model Benchmark Tool.
I unzipped the face_landmarker.task
and benchmarked both the face_detector.tflite
and the face_landmarks_detector.tlite
models:
~ $ adb shell am start -S \
-n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
--es args '"--graph=/data/local/tmp/face_detector.tflite \
--use_gpu=true"'
~ $ adb shell am start -S \
-n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
--es args '"--graph=/data/local/tmp/face_landmarks_detector.tflite \
--use_gpu=true"'
~ $ adb shell am start -S \
-n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
--es args '"--graph=/data/local/tmp/face_detector.tflite \
--use_gpu=false"'
~ $ adb shell am start -S \
-n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
--es args '"--graph=/data/local/tmp/face_landmarks_detector.tflite \
--use_gpu=false"'
Results:
adb logcat | grep "Inference timings in us"
02-28 23:15:03.904 30233 30233 I tflite : Inference timings in us: Init: 2770755, First inference: 19349, Warmup (avg): 4108.97, Inference (avg): 3611.28
02-28 23:15:22.704 30277 30277 I tflite : Inference timings in us: Init: 4607458, First inference: 31031, Warmup (avg): 12077.7, Inference (avg): 11406.
02-28 23:15:43.189 30337 30337 I tflite : Inference timings in us: Init: 16248, First inference: 10541, Warmup (avg): 11986.1, Inference (avg): 15703.2
02-28 23:15:53.231 30379 30379 I tflite : Inference timings in us: Init: 86527, First inference: 59826, Warmup (avg): 60634.6, Inference (avg): 72700.4
So, on GPU:
- face detector: 3611.28 µs
- face landmarker: 11406.9 µs
- total = 15018.18 µs
And on CPU:
- face detector: 15703.2 µs
- face landmarker: 72700.4 µs
- total = 88403.6 µs
In my app, I am initializing like this:
val baseOptions = BaseOptions.builder()
.setModelAssetPath(context.getString("face_landmarker.task"))
.setDelegate(Delegate.GPU)
.build()
val options = FaceLandmarker.FaceLandmarkerOptions.builder()
.setBaseOptions(baseOptions)
.setMinFaceDetectionConfidence(0.5f)
.setMinTrackingConfidence(0.5f)
.setMinFacePresenceConfidence(0.5f)
.setNumFaces(1)
.setOutputFacialTransformationMatrixes(false)
.setOutputFaceBlendshapes(false)
.setRunningMode(RunningMode.VIDEO)
.build()
val faceLandmarker = FaceLandmarker.createFromOptions(context, options)
And then using it like this:
val bitmap = image.toBitmap() // where image is an `ImageProxy` from the camera
val mpImage = BitmapImageBuilder(bitmap).build()
val timestampMs = SystemClock.uptimeMillis()
val result = faceLandmarker.detectForVideo(mpImage, timestampMs)
val detectTimeMs = SystemClock.uptimeMillis() - timestampMs
I would expect to see a bit more than 15 ms because the mpImage
is 640×480 and needs to be resized to 192×192 and 256×256 for the detector and the landmarker, respectively.
However, the gap between the TFLite model benchmark tool (15ms) and the actual app (30-70ms) seems too large.
Am I initializing the model properly, is there something I am missing that's hampering the performance?
Thanks in advance.