Skip to content

Softmax kernel crashes when input zero_point != 0 (violates TFLite quantization spec) #3499

@liuyx-baller

Description

@liuyx-baller

Description

The TFLite Micro softmax kernel implementation crashes when the input tensor has a non-zero zero_point value. According to the TensorFlow Lite Quantization Specification, the softmax input should accept any zero_point in the range [-128, 127] with no restrictions. However, the current implementation implicitly assumes zero_point = 0, which can cause crashes or incorrect behavior.

Current Behavior

When a quantized model has a softmax input with zero_point != 0, the kernel may crash with the following assertion failure:
TFLITE_CHECK(0 <= exponent && exponent <= 31) failed

This happens in tensorflow/lite/kernels/internal/reference/softmax.h at line 117:

const int exponent = num_bits_over_unit + 31 - (sizeof(OutputT) * 8);
TFLITE_CHECK(0 <= exponent && exponent <= 31);  // ← Crash here when exponent = 32

Root Cause Analysis

  1. Missing zero_point check: In softmax_common.cc, the int8 input path does not validate input->params.zero_point:
// Current code for int8 input - no zero_point check!
if (input->type == kTfLiteInt8) {
    // TF_LITE_ENSURE_EQ(context, input->params.zero_point, 0);  // ← Missing!
    ...
}
  1. Algorithm assumes symmetric quantization: The softmax implementation in reference/softmax.h calculates input_diff directly in quantized space without considering zero_point:
int32_t input_diff = static_cast<int32_t>(input_data[i * depth + c]) - max_in_row;

This works correctly only when zero_point = 0. When zero_point != 0, the sum_of_exps value can exceed the expected range, causing num_bits_over_unit to be abnormally high (e.g., 9 instead of ≤8), which leads to exponent = 32 and the crash.

  1. diff_min calculation: The diff_min parameter is calculated based on the assumption of symmetric input quantization (zero_point = 0), which may not match the actual input distribution when zero_point != 0.

Expected Behavior

According to the TFLite Quantization Spec, softmax should accept:
SOFTMAX
Input 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
(no restriction on zero_point)
Output 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
restriction: (scale, zero_point) = (1.0 / 256.0, -128)

The input zero_point has no restriction and should work with any value in [-128, 127].

Comparison with Standard TFLite

The standard TensorFlow Lite (Python/C++) can run the same model without crashes, suggesting it has additional handling for non-zero zero_point values that TFLite Micro lacks.

Possible Solutions

  1. Add explicit check and early failure (minimal fix):
if (input->type == kTfLiteInt8) {
    TF_LITE_ENSURE_EQ(context, input->params.zero_point, 0);
    ...
}
  1. Properly handle non-zero zero_point (complete fix):

    • Modify the softmax algorithm to correctly handle asymmetric input quantization
    • This would involve adjusting how input_diff is calculated and how diff_min is computed
  2. Document the limitation: At minimum, add a comment documenting that only zero_point = 0 is supported for int8 inputs.

Steps to Reproduce

  1. Create or obtain a quantized TFLite model where the softmax layer input has zero_point != 0 (this can happen depending on calibration data distribution)
  2. Load and run the model with TFLite Micro
  3. Observe the crash when inference reaches the softmax layer

Environment

  • TensorFlow Lite Micro version: (your version, e.g., latest main branch)
  • Target platform: (e.g., x86_64, ARM Cortex-M, etc.)
  • Build system: Make/CMake

Additional Context

This issue was discovered when running a production model where the TFLite Converter generated zero_point = -2 for the softmax input tensor. The model works correctly with standard TFLite but crashes on TFLite Micro.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions