Facelandmarker slow on Android apps but fast with TFLite model benchmark tool

### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

### OS Platform and Distribution

Android 15

### MediaPipe Tasks SDK version

0.10.21

### Task name (e.g. Image classification, Gesture recognition etc.)

Face landmark detection

### Programming Language and version (e.g. C++, Python, Java)

Kotlin

### Describe the actual behavior

The face landmarker model takes about 30-70 ms to run on a Pixel 9 Pro

### Describe the expected behaviour

I would expect the model to run in real-time (on Desktop, using Wasm it takes 15-20 ms)

### Standalone code/steps you may have used to try to get what you need

I tried the code sample code from https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/face_landmarker/android

### Other info / Complete Logs


To check how fast the model can run, I used the TFLite Model Benchmark Tool.
I unzipped the `face_landmarker.task` and benchmarked both the `face_detector.tflite` and the `face_landmarks_detector.tlite` models:

```shell
~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_detector.tflite \
      --use_gpu=true"'
~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_landmarks_detector.tflite \
      --use_gpu=true"'

~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_detector.tflite \
      --use_gpu=false"'
~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_landmarks_detector.tflite \
      --use_gpu=false"'
```

Results:

```shell
adb logcat | grep "Inference timings in us"
02-28 23:15:03.904 30233 30233 I tflite  : Inference timings in us: Init: 2770755, First inference: 19349, Warmup (avg): 4108.97, Inference (avg): 3611.28
02-28 23:15:22.704 30277 30277 I tflite  : Inference timings in us: Init: 4607458, First inference: 31031, Warmup (avg): 12077.7, Inference (avg): 11406.

02-28 23:15:43.189 30337 30337 I tflite  : Inference timings in us: Init: 16248, First inference: 10541, Warmup (avg): 11986.1, Inference (avg): 15703.2
02-28 23:15:53.231 30379 30379 I tflite  : Inference timings in us: Init: 86527, First inference: 59826, Warmup (avg): 60634.6, Inference (avg): 72700.4
```

So, on GPU:
- face detector: 3611.28 µs
- face landmarker: 11406.9 µs
- total = 15018.18 µs

And on CPU:
- face detector: 15703.2 µs
- face landmarker: 72700.4 µs
- total = 88403.6 µs

In my app, I am initializing like this:

```kotlin
        val baseOptions = BaseOptions.builder()
            .setModelAssetPath(context.getString("face_landmarker.task"))
            .setDelegate(Delegate.GPU)
            .build()
        val options = FaceLandmarker.FaceLandmarkerOptions.builder()
            .setBaseOptions(baseOptions)
            .setMinFaceDetectionConfidence(0.5f)
            .setMinTrackingConfidence(0.5f)
            .setMinFacePresenceConfidence(0.5f)
            .setNumFaces(1)
            .setOutputFacialTransformationMatrixes(false)
            .setOutputFaceBlendshapes(false)
            .setRunningMode(RunningMode.VIDEO)
            .build()
        val faceLandmarker = FaceLandmarker.createFromOptions(context, options)
```

And then using it like this:

```kotlin
        val bitmap = image.toBitmap()  // where image is an `ImageProxy` from the camera
        val mpImage = BitmapImageBuilder(bitmap).build()
        val timestampMs = SystemClock.uptimeMillis()
        val result = faceLandmarker.detectForVideo(mpImage, timestampMs)
        val detectTimeMs = SystemClock.uptimeMillis() - timestampMs
```


I would expect to see a bit more than 15 ms because the `mpImage` is 640×480 and needs to be resized to 192×192 and 256×256 for the detector and the landmarker, respectively.
However, the gap between the TFLite model benchmark tool (15ms) and the actual app (30-70ms) seems too large.
Am I initializing the model properly, is there something I am missing that's hampering the performance?

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Facelandmarker slow on Android apps but fast with TFLite model benchmark tool #5872

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Facelandmarker slow on Android apps but fast with TFLite model benchmark tool #5872

Description

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions