Open
Description
I am trying to save a quantized ternary model to a .tflite
file, but larq doesn't seem to save the weights using datatypes with a reduced precision and thus compress the file size.
However, after converting and writing to disk, the size of the file is about the same as the one predicted by larq.models.summary
in float32 parameters.
Even if I try to do the same thing with a simple QuantDense
layer, the weights are saved in float32.
I am using this kind of code:
quantDense = larq.layers.QuantDense(1000, kernel_quantizer="ste_sign", use_bias=False)
quantDense(tf.ones((1, 500)))
with larq.context.quantized_scope(True):
inp_quant = keras.Input((1,500))
out_quant = quantDense(inp_quant)
quantModelTest = keras.Model(inputs=inp_quant, outputs=out_quant)
print("Keras test model")
larq.models.summary(quantModelTest)
print("converting keras test model to tflite")
converted = lce.convert_keras_model(quantModelTest)
with open("test.tflite", "wb") as f:
print("writing tflite model to disk")
f.write(converted)
Am I doing something wrong?
Metadata
Metadata
Assignees
Labels
No labels
Activity