-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Description
The TFLite Micro softmax kernel implementation crashes when the input tensor has a non-zero zero_point value. According to the TensorFlow Lite Quantization Specification, the softmax input should accept any zero_point in the range [-128, 127] with no restrictions. However, the current implementation implicitly assumes zero_point = 0, which can cause crashes or incorrect behavior.
Current Behavior
When a quantized model has a softmax input with zero_point != 0, the kernel may crash with the following assertion failure:
TFLITE_CHECK(0 <= exponent && exponent <= 31) failed
This happens in tensorflow/lite/kernels/internal/reference/softmax.h at line 117:
const int exponent = num_bits_over_unit + 31 - (sizeof(OutputT) * 8);
TFLITE_CHECK(0 <= exponent && exponent <= 31); // ← Crash here when exponent = 32Root Cause Analysis
- Missing zero_point check: In
softmax_common.cc, the int8 input path does not validateinput->params.zero_point:
// Current code for int8 input - no zero_point check!
if (input->type == kTfLiteInt8) {
// TF_LITE_ENSURE_EQ(context, input->params.zero_point, 0); // ← Missing!
...
}- Algorithm assumes symmetric quantization: The softmax implementation in
reference/softmax.hcalculatesinput_diffdirectly in quantized space without consideringzero_point:
int32_t input_diff = static_cast<int32_t>(input_data[i * depth + c]) - max_in_row;This works correctly only when zero_point = 0. When zero_point != 0, the sum_of_exps value can exceed the expected range, causing num_bits_over_unit to be abnormally high (e.g., 9 instead of ≤8), which leads to exponent = 32 and the crash.
- diff_min calculation: The
diff_minparameter is calculated based on the assumption of symmetric input quantization (zero_point = 0), which may not match the actual input distribution whenzero_point != 0.
Expected Behavior
According to the TFLite Quantization Spec, softmax should accept:
SOFTMAX
Input 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
(no restriction on zero_point)
Output 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
restriction: (scale, zero_point) = (1.0 / 256.0, -128)
The input zero_point has no restriction and should work with any value in [-128, 127].
Comparison with Standard TFLite
The standard TensorFlow Lite (Python/C++) can run the same model without crashes, suggesting it has additional handling for non-zero zero_point values that TFLite Micro lacks.
Possible Solutions
- Add explicit check and early failure (minimal fix):
if (input->type == kTfLiteInt8) {
TF_LITE_ENSURE_EQ(context, input->params.zero_point, 0);
...
}-
Properly handle non-zero zero_point (complete fix):
- Modify the softmax algorithm to correctly handle asymmetric input quantization
- This would involve adjusting how
input_diffis calculated and howdiff_minis computed
-
Document the limitation: At minimum, add a comment documenting that only
zero_point = 0is supported for int8 inputs.
Steps to Reproduce
- Create or obtain a quantized TFLite model where the softmax layer input has
zero_point != 0(this can happen depending on calibration data distribution) - Load and run the model with TFLite Micro
- Observe the crash when inference reaches the softmax layer
Environment
- TensorFlow Lite Micro version: (your version, e.g., latest main branch)
- Target platform: (e.g., x86_64, ARM Cortex-M, etc.)
- Build system: Make/CMake
Additional Context
This issue was discovered when running a production model where the TFLite Converter generated zero_point = -2 for the softmax input tensor. The model works correctly with standard TFLite but crashes on TFLite Micro.