Instead of passing the dimensions of the matrix for each benchmark, the user should just specify how many floating point operations they want each benchmark to perform. Then, the matrix sizes should be calculated based on this to be as square as possible.
New usage:
./slownode <iters> <flops>