Hi team,
Thank you for your contribution.
I have gone through the Terngemm paper and understood that the key ideas in the paper in comparison with gemmlowp are:
1.) TernaryMAC(replacement of muladd operation with the bitwise operation).
2.) Bit-incremental accumulation(avoiding the INT32 accumulation of the partial sums).
I did not get about the packing of low-bit activations(3,4,5,6). Are you converting them to 8-bit and then computing on them, or is there any additional bit packing involved? and also for the ternary weights?
Thanks