Description
What's happening?
Hi,
I've been trying to debug why text is returned in the wrong order (last-to-first) in landscape (rotated 90 degrees clockwise), while it worked as expected in portrait and the opposite landscape mode, when using MLKit's TextRecognition. (Tested directly with the example app with sample code from MLKit, originally tested with the text-recognition plugin).
The strange thing is that for whatever reason for InputImage.fromMediaImage(image, rotation)
any rotation value here results in no change to the MLKit results.
If I enable analyzer.setOutputRotationEnabled(true)
the image is rotated as expected and MLKit's results are the same for each orientation again – this is obviously not a satisfactory solution due to the intended "fixing" of the frame.orientation. It does however show that inputting the correctly rotated ImageProxy works as intended – and still any value for rotation does nothing.
I am wondering if this is an MLKit issue mostly, or if this is some weird interaction with the Alpha-versions of CameraX that this library is using. Either way it seems somewhat relevant, since the built-in code scanner also uses InputImage.fromMediaImage
.
Any help would be appreciated, I have spent a lot of time trying to figure out why this happens.
I can see some other issues related to similar issues with device orientation, but I couldn't find an exact match.
Reproduceable Code
Modify the ExampleKotlinFrameProcessorPlugin.kt
in the example app like so:
package com.mrousavy.camera.example
import android.util.Log
import androidx.annotation.OptIn
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import com.google.mlkit.vision.text.latin.TextRecognizerOptions
import com.mrousavy.camera.core.types.Orientation
import com.mrousavy.camera.frameprocessors.Frame
import com.mrousavy.camera.frameprocessors.FrameProcessorPlugin
import com.mrousavy.camera.frameprocessors.VisionCameraProxy
class ExampleKotlinFrameProcessorPlugin(proxy: VisionCameraProxy, options: Map<String, Any>?): FrameProcessorPlugin() {
val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
init {
Log.d("ExampleKotlinPlugin", "ExampleKotlinFrameProcessorPlugin initialized with options: " + options?.toString())
}
// Might have swapped landscape degrees
fun orientationToDegrees(orientation: Orientation): Int {
return when(orientation) {
Orientation.PORTRAIT -> 0
Orientation.LANDSCAPE_RIGHT -> 90
Orientation.PORTRAIT_UPSIDE_DOWN -> 180
Orientation.LANDSCAPE_LEFT -> 270
}
}
@OptIn(androidx.camera.core.ExperimentalGetImage::class)
override fun callback(frame: Frame, params: Map<String, Any>?): Any? {
if (params == null) {
return null
}
val image = frame.image
val mediaImage = frame.imageProxy.image
if (mediaImage != null) {
frame.incrementRefCount()
// Even with manual 0, 90, 180 or 270 degrees output is always the same
val inputImage = InputImage.fromMediaImage(mediaImage, orientationToDegrees(frame.orientation))
recognizer.process(inputImage)
.addOnSuccessListener { visionText ->
Log.d("TextRecognizer", "Result ${visionText.text}")
frame.decrementRefCount()
}
.addOnFailureListener { e ->
Log.d("TextRecognizer", "Failure $e")
frame.decrementRefCount()
}
}
Log.d(
"ExampleKotlinPlugin",
image.width.toString() + " x " + image.height + " Image with format #" + image.format + ". Logging " + params.size + " parameters:"
)
for (key in params.keys) {
val value = params[key]
Log.d("ExampleKotlinPlugin", " -> " + if (value == null) "(null)" else value.toString() + " (" + value.javaClass.name + ")")
}
return hashMapOf<String, Any>(
"example_str" to "KotlinTest",
"example_bool" to false,
"example_double" to 6.7,
"example_array" to arrayListOf<Any>(
"Good bye",
false,
21.37
)
)
}
}
Add the relevant imports to app/build.gradle
in dependencies (also tested with google-play version of mlkit text-recognition):
implementation 'com.google.mlkit:text-recognition:16.0.1'
implementation 'androidx.camera:camera-core:1.5.0-alpha03'
Set targetFPS to 1 in CameraPage.tsx
since TextRecognition is resource intensive – otherwise kept the same as in the Example app.
Relevant log output
Correct order example:
Text: Abc
123
Test
Inverted order:
Text: Test
123
Abc
Camera Device
{
"formats": [],
"sensorOrientation": "landscape-left",
"hardwareLevel": "full",
"maxZoom": 4,
"minZoom": 1,
"maxExposure": 12,
"supportsLowLightBoost": false,
"neutralZoom": 1,
"physicalDevices": [
"wide-angle-camera"
],
"supportsFocus": true,
"supportsRawCapture": false,
"isMultiCam": false,
"minFocusDistance": 10,
"minExposure": -12,
"name": "0 (BACK) androidx.camera.camera2",
"hasFlash": true,
"hasTorch": true,
"position": "back",
"id": "0"
}
Device
Pixel 9 Pro Android 15, Asus Zenfone 8 Android 13
VisionCamera Version
4.6.4
Can you reproduce this issue in the VisionCamera Example app?
Yes, I can reproduce the same issue in the Example app here
Additional information
- I am using Expo
- I have enabled Frame Processors (react-native-worklets-core)
- I have read the Troubleshooting Guide
- I agree to follow this project's Code of Conduct
- I searched for similar issues in this repository and found none.