Softmax kernel crashes when input zero_point != 0 (violates TFLite quantization spec)

## Description

The TFLite Micro softmax kernel implementation crashes when the input tensor has a non-zero `zero_point` value. According to the [TensorFlow Lite Quantization Specification](https://www.tensorflow.org/lite/performance/quantization_spec), the softmax input should accept any `zero_point` in the range [-128, 127] with no restrictions. However, the current implementation implicitly assumes `zero_point = 0`, which can cause crashes or incorrect behavior.

## Current Behavior

When a quantized model has a softmax input with `zero_point != 0`, the kernel may crash with the following assertion failure:
TFLITE_CHECK(0 <= exponent && exponent <= 31) failed


This happens in `tensorflow/lite/kernels/internal/reference/softmax.h` at line 117:

```cpp
const int exponent = num_bits_over_unit + 31 - (sizeof(OutputT) * 8);
TFLITE_CHECK(0 <= exponent && exponent <= 31);  // ← Crash here when exponent = 32
```

## Root Cause Analysis

1. **Missing zero_point check**: In `softmax_common.cc`, the int8 input path does not validate `input->params.zero_point`:

```cpp
// Current code for int8 input - no zero_point check!
if (input->type == kTfLiteInt8) {
    // TF_LITE_ENSURE_EQ(context, input->params.zero_point, 0);  // ← Missing!
    ...
}
```

2. **Algorithm assumes symmetric quantization**: The softmax implementation in `reference/softmax.h` calculates `input_diff` directly in quantized space without considering `zero_point`:

```cpp
int32_t input_diff = static_cast<int32_t>(input_data[i * depth + c]) - max_in_row;
```

This works correctly only when `zero_point = 0`. When `zero_point != 0`, the `sum_of_exps` value can exceed the expected range, causing `num_bits_over_unit` to be abnormally high (e.g., 9 instead of ≤8), which leads to `exponent = 32` and the crash.

3. **diff_min calculation**: The `diff_min` parameter is calculated based on the assumption of symmetric input quantization (`zero_point = 0`), which may not match the actual input distribution when `zero_point != 0`.

## Expected Behavior

According to the [TFLite Quantization Spec](https://www.tensorflow.org/lite/performance/quantization_spec), softmax should accept:
SOFTMAX
Input 0:
data_type  : int8
range      : [-128, 127]
granularity: per-tensor
(no restriction on zero_point)
Output 0:
data_type  : int8
range      : [-128, 127]
granularity: per-tensor
restriction: (scale, zero_point) = (1.0 / 256.0, -128)


The input `zero_point` has no restriction and should work with any value in [-128, 127].

## Comparison with Standard TFLite

The standard TensorFlow Lite (Python/C++) can run the same model without crashes, suggesting it has additional handling for non-zero `zero_point` values that TFLite Micro lacks.

## Possible Solutions

1. **Add explicit check and early failure** (minimal fix):
```cpp
if (input->type == kTfLiteInt8) {
    TF_LITE_ENSURE_EQ(context, input->params.zero_point, 0);
    ...
}
```

2. **Properly handle non-zero zero_point** (complete fix):
   - Modify the softmax algorithm to correctly handle asymmetric input quantization
   - This would involve adjusting how `input_diff` is calculated and how `diff_min` is computed

3. **Document the limitation**: At minimum, add a comment documenting that only `zero_point = 0` is supported for int8 inputs.

## Steps to Reproduce

1. Create or obtain a quantized TFLite model where the softmax layer input has `zero_point != 0` (this can happen depending on calibration data distribution)
2. Load and run the model with TFLite Micro
3. Observe the crash when inference reaches the softmax layer

## Environment

- TensorFlow Lite Micro version: (your version, e.g., latest main branch)
- Target platform: (e.g., x86_64, ARM Cortex-M, etc.)
- Build system: Make/CMake

## Additional Context

This issue was discovered when running a production model where the TFLite Converter generated `zero_point = -2` for the softmax input tensor. The model works correctly with standard TFLite but crashes on TFLite Micro.

## Related

- [TFLite Quantization Specification](https://www.tensorflow.org/lite/performance/quantization_spec)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Softmax kernel crashes when input zero_point != 0 (violates TFLite quantization spec) #3499

Description

Current Behavior

Root Cause Analysis

Expected Behavior

Comparison with Standard TFLite

Possible Solutions

Steps to Reproduce

Environment

Additional Context

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Softmax kernel crashes when input zero_point != 0 (violates TFLite quantization spec) #3499

Description

Description

Current Behavior

Root Cause Analysis

Expected Behavior

Comparison with Standard TFLite

Possible Solutions

Steps to Reproduce

Environment

Additional Context

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions