Description
When creating synapses, we are currently checking for occurrences of multiple synapses for the same (pre, post)
neuron pair (here in synapses_create_array.cu
and also in synapses_create_generator.cu
). We need this to choose the correct parallelisation mode. But this check is very time demanding. Currently for 10^7
synapses with syn.connect(p=sparseness)
(brunel scalar delay example with 10^4
neurons), our synapses_create_generator.cu
template takes ~30s
. With the cpp_standalone
or genn
device (which both use almost the same template), it takes only in the order of ~1s
. Our check for multiple pre post synapses takes ~20s
. And we seem to loose another ~10s
in my random number buffers operator[]
overload. Looks like I'm doing something very inefficient here?
I would say it makes sense to have a user preference to choose not to check for multiple pre post synapses (if the user is sure they don't exist).
Or find a more efficient way of checking this, maybe on the gpu (instead of using a map of id pairs to integer counters that has to loop through all existing synapses).
For the operator[]
performance, it would probably make sense to just use the cpp_standalone
random number generation implementation for host code. Or precompute the number of needed random numbers and not use the buffer class at all, but normal pointer arithmetic. Or just find out why my implementation is inefficient. Because the random number generation on the device and copying it to host for usage seems to be quite fast.