Skip to content

🐛 [Android] Possible issue with MLKit and Image/ImageProxy rotation #3490

Open
@eebirke

Description

@eebirke

What's happening?

Hi,

I've been trying to debug why text is returned in the wrong order (last-to-first) in landscape (rotated 90 degrees clockwise), while it worked as expected in portrait and the opposite landscape mode, when using MLKit's TextRecognition. (Tested directly with the example app with sample code from MLKit, originally tested with the text-recognition plugin).

The strange thing is that for whatever reason for InputImage.fromMediaImage(image, rotation) any rotation value here results in no change to the MLKit results.

If I enable analyzer.setOutputRotationEnabled(true) the image is rotated as expected and MLKit's results are the same for each orientation again – this is obviously not a satisfactory solution due to the intended "fixing" of the frame.orientation. It does however show that inputting the correctly rotated ImageProxy works as intended – and still any value for rotation does nothing.

I am wondering if this is an MLKit issue mostly, or if this is some weird interaction with the Alpha-versions of CameraX that this library is using. Either way it seems somewhat relevant, since the built-in code scanner also uses InputImage.fromMediaImage.

Any help would be appreciated, I have spent a lot of time trying to figure out why this happens.

I can see some other issues related to similar issues with device orientation, but I couldn't find an exact match.

Reproduceable Code

Modify the ExampleKotlinFrameProcessorPlugin.kt in the example app like so:

package com.mrousavy.camera.example

import android.util.Log
import androidx.annotation.OptIn
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import com.google.mlkit.vision.text.latin.TextRecognizerOptions
import com.mrousavy.camera.core.types.Orientation
import com.mrousavy.camera.frameprocessors.Frame
import com.mrousavy.camera.frameprocessors.FrameProcessorPlugin
import com.mrousavy.camera.frameprocessors.VisionCameraProxy

class ExampleKotlinFrameProcessorPlugin(proxy: VisionCameraProxy, options: Map<String, Any>?): FrameProcessorPlugin() {
    val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
    init {
        Log.d("ExampleKotlinPlugin", "ExampleKotlinFrameProcessorPlugin initialized with options: " + options?.toString())
    }

    // Might have swapped landscape degrees
    fun orientationToDegrees(orientation: Orientation): Int {
        return when(orientation) {
            Orientation.PORTRAIT -> 0
            Orientation.LANDSCAPE_RIGHT -> 90
            Orientation.PORTRAIT_UPSIDE_DOWN -> 180
            Orientation.LANDSCAPE_LEFT -> 270
        }
    }

    @OptIn(androidx.camera.core.ExperimentalGetImage::class)
    override fun callback(frame: Frame, params: Map<String, Any>?): Any? {
        if (params == null) {
            return null
        }

        val image = frame.image

        val mediaImage = frame.imageProxy.image
        if (mediaImage != null) {
            frame.incrementRefCount()
            // Even with manual 0, 90, 180 or 270 degrees output is always the same
            val inputImage = InputImage.fromMediaImage(mediaImage, orientationToDegrees(frame.orientation))
            recognizer.process(inputImage)
                .addOnSuccessListener { visionText ->
                    Log.d("TextRecognizer", "Result ${visionText.text}")
                    frame.decrementRefCount()
                }
                .addOnFailureListener { e ->
                    Log.d("TextRecognizer", "Failure $e")
                    frame.decrementRefCount()
                }
        }

        Log.d(
            "ExampleKotlinPlugin",
            image.width.toString() + " x " + image.height + " Image with format #" + image.format + ". Logging " + params.size + " parameters:"
        )

        for (key in params.keys) {
            val value = params[key]
            Log.d("ExampleKotlinPlugin", "  -> " + if (value == null) "(null)" else value.toString() + " (" + value.javaClass.name + ")")
        }

        return hashMapOf<String, Any>(
            "example_str" to "KotlinTest",
            "example_bool" to false,
            "example_double" to 6.7,
            "example_array" to arrayListOf<Any>(
                "Good bye",
                false,
                21.37
            )
        )
    }
}

Add the relevant imports to app/build.gradle in dependencies (also tested with google-play version of mlkit text-recognition):

    implementation 'com.google.mlkit:text-recognition:16.0.1'
    implementation 'androidx.camera:camera-core:1.5.0-alpha03'

Set targetFPS to 1 in CameraPage.tsx since TextRecognition is resource intensive – otherwise kept the same as in the Example app.

Relevant log output

Correct order example:

Text: Abc
123
Test


Inverted order:

Text: Test
123
Abc

Camera Device

{
  "formats": [],
  "sensorOrientation": "landscape-left",
  "hardwareLevel": "full",
  "maxZoom": 4,
  "minZoom": 1,
  "maxExposure": 12,
  "supportsLowLightBoost": false,
  "neutralZoom": 1,
  "physicalDevices": [
    "wide-angle-camera"
  ],
  "supportsFocus": true,
  "supportsRawCapture": false,
  "isMultiCam": false,
  "minFocusDistance": 10,
  "minExposure": -12,
  "name": "0 (BACK) androidx.camera.camera2",
  "hasFlash": true,
  "hasTorch": true,
  "position": "back",
  "id": "0"
}

Device

Pixel 9 Pro Android 15, Asus Zenfone 8 Android 13

VisionCamera Version

4.6.4

Can you reproduce this issue in the VisionCamera Example app?

Yes, I can reproduce the same issue in the Example app here

Additional information

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐛 bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions