Should feature vectors be columns or rows? #11
Description
Hi @JuliaDSP/developers,
I hope this reaches the JuliaDSP developers by some Github magic. I have a long lasting inner decision issue that I can't seem to resolve satisfactorily for myself, and that is whether feature vectors for a time sequence should be stacked as row vectors or as column vectors. Arguments I can think of for row vectors:
- similar layout to how data is shown in a
DataFrame
- default layout interpretation for
cov()
- mental association with data scrolling in a terminal (top to bottom)
But then again, there are arguments for storing them as column vectors:
- seems to be more in line with the Machine Learning community, e.g, this is how Andrew Ng does things in https://www.coursera.org/specializations/deep-learning, even though this is taught through Python, which as a different (IMHO better) memory layout than Julia
- in the Julia storage layout, elements in a feature vector are contiguous in memory, and streaming of feature vectors in (real) time make more sense memory-wise.
- it is consistent with, e.g.,
DSP.spectogram
A couple of years back, I had decided for row vectors in MFCC and GaussianMixtures, but now I am beginning to doubt, and I am considering switching to column vectors---which would require a clever coding design, a fair amount of re-coding, and possibly breaking dependent code in the future.
I realize column vs row vector is probably as arbitrary as column vs row-major storage layout, or base 0 and base 1 indexing, so therefore I'd like to stick with best practices in the community.
What is your opinion on the matter?