GPU backend bug?

I have a model that runs on CPU, but gets incorrect results on GPU. The model is float32 (quantization would be the next step), and while I've adjusted it to remove non-GPU supported operators, and reduce 5D tensors to 4D, the CPU results are still identical, so I'm confident that the model is fine.

I am using LiteRT 2.1.0 with the Kotlin API, which says I shouldn't have to do anything different with the buffers than CPU, even though it creates OpenClBufferPacked Buffers rather than HostMemory ones.
I was running through the same code for CPU/GPU (just with different accelerator option), but I've adjusted to try and eliminate any source of issues - forcing precision to FP32, doing everything from model creation to results in a single thread in a single function, and adjusting how I write to the buffers. The model should have dynamic input, but I'm now padding with 0s to be safe. I still get consistently the same outputs for given inputs for both CPU and GPU through all the tests. Indeed if I make a one-character change to the below code to choose CPU accelerator I get the correct result.  GPU consistently gives near-zero (but not zero) outputs


    private val gpuThread = Executors.newSingleThreadExecutor()

    fun runOnGpu(audio: FloatArray, cb: (FloatArray) -> Unit) {
      gpuThread.execute {
        val opts = CompiledModel.Options(toAccelerator(AcceleratorEnum.GPU)).apply {
            gpuOptions = CompiledModel.GpuOptions(
                precision = CompiledModel.GpuOptions.Precision.FP32
            )
        }
        val model = CompiledModel.create(
            context.assets,
            "$name.tflite",
            opts,
            null
        )
        val inputs = model.createInputBuffers()
        val outputs = model.createOutputBuffers()

        inputs[0].writeFloat(prep(audio))   // must be exact length
        model.run(inputs, outputs)
        val logits = outputs[0].readFloat()

        // close everything if you recreate per run
        outputs.forEach { it.close() }
        inputs.forEach { it.close() }
        model.close()

        cb(logits)
      }
    }


So I suspect a GPU backend bug, unless there is something undocumented (or at least I've not found) about buffer input/output.

The .tflite file in question is 467MB, but I can upload it somewhere.
The model contains operators: Conv2D, FullyConnected, Transpose, Batch Matrix Multiply, GELU, ADD, DEPTHWISE_CONV_2D, Multiply, Softmax, Subtract, Mean, Slice, Squared Difference, Logistic, Pad, Reciprocal Square Root, RESHAPE, Sum - do any of them have known issues?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU backend bug? #5937

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU backend bug? #5937

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions