This project aims to implement Neural Networks algorithms from scratch, as cleanly and modulable as possible.
Library overhead
Here's the call tree of the project, when training a basic MNIST MLP resolver.
As expected, the matrix multiplication takes ~95% of CPU time.
Note that the application only ran for a few minutes, JIT by the JVM might change results while training larger models.