The converter works by (1) strip QKeras model of quantization attributes and store in a dictionary; (2) convert (as if plain Keras model) using tf2onnx; (3) Insert “Quant” nodes at appropriate locations based on a dictionary of quantization attributes.
The current version has few issues given how tf2onnx inserts the quant nodes. These problems have suitable workarounds detailed below.
The quantized-relu quantization inserts a redundant quantization node when used as output activation of Dense/Conv2D layer.
Workaround: Only use quantized-relu activation in a seperate QActivation layers.
The quantized-bits quantization node is not added to the model when used in QActivation layers.
Workaround: Use quantized-bits only at the output of a Dense/Conv2D layers.
A threshold of 0.5 must be used when using ternary quantization. (This is sometimes unstable even with t=0.5)

