This checks out against axis1.h and axis1.c, but I don't understand why it must be mixed type? Is memory a big concern here, and making one of the 3 arrays in this operation half the size actually worth it? (E.g., why not have everything as float32 going in? Saves a conversion).
Originally posted by @maffettone in #98 (comment)