Open
Description
Aim: run algorithms in parallel as much as possible, with maximal efficiency.
This will be an umbrella issue based on following idea:
I see 3 different levels of parallelisation that could be implemented:
- low-level: , for-loops, implicit, by c++17 Parallel -> Enable use of Parallel TS of c++17 #214
- this reduces (if??) execution time of each,smallest code-unit
- hand-coded/mid-level: efficient, makes code much more difficult -> Parallel SpatialPooler, Connections #254
- high-level: NetworkAPI Regions, classes SP,TM,Encoder,Classifier in a pipeline could run in parallel
- reduces pipeline time to the min(time of each unit)
- tracked in NetworkAPI Regions run in parallel #253
from #214 (comment)
EDIT:
TODO:
- can we estimate number of threads on system?
- at runtime
- at compile time with CMake?
- worst case we pass a param
-DNUMTHREADS=8