-
Notifications
You must be signed in to change notification settings - Fork 238
Description
I have a model that runs on CPU, but gets incorrect results on GPU. The model is float32 (quantization would be the next step), and while I've adjusted it to remove non-GPU supported operators, and reduce 5D tensors to 4D, the CPU results are still identical, so I'm confident that the model is fine.
I am using LiteRT 2.1.0 with the Kotlin API, which says I shouldn't have to do anything different with the buffers than CPU, even though it creates OpenClBufferPacked Buffers rather than HostMemory ones.
I was running through the same code for CPU/GPU (just with different accelerator option), but I've adjusted to try and eliminate any source of issues - forcing precision to FP32, doing everything from model creation to results in a single thread in a single function, and adjusting how I write to the buffers. The model should have dynamic input, but I'm now padding with 0s to be safe. I still get consistently the same outputs for given inputs for both CPU and GPU through all the tests. Indeed if I make a one-character change to the below code to choose CPU accelerator I get the correct result. GPU consistently gives near-zero (but not zero) outputs
private val gpuThread = Executors.newSingleThreadExecutor()
fun runOnGpu(audio: FloatArray, cb: (FloatArray) -> Unit) {
gpuThread.execute {
val opts = CompiledModel.Options(toAccelerator(AcceleratorEnum.GPU)).apply {
gpuOptions = CompiledModel.GpuOptions(
precision = CompiledModel.GpuOptions.Precision.FP32
)
}
val model = CompiledModel.create(
context.assets,
"$name.tflite",
opts,
null
)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(prep(audio)) // must be exact length
model.run(inputs, outputs)
val logits = outputs[0].readFloat()
// close everything if you recreate per run
outputs.forEach { it.close() }
inputs.forEach { it.close() }
model.close()
cb(logits)
}
}
So I suspect a GPU backend bug, unless there is something undocumented (or at least I've not found) about buffer input/output.
The .tflite file in question is 467MB, but I can upload it somewhere.
The model contains operators: Conv2D, FullyConnected, Transpose, Batch Matrix Multiply, GELU, ADD, DEPTHWISE_CONV_2D, Multiply, Softmax, Subtract, Mean, Slice, Squared Difference, Logistic, Pad, Reciprocal Square Root, RESHAPE, Sum - do any of them have known issues?