I have some code, which queries cudaGetDeviceProperties for multiProcessorCount and maxThreadsPerMultiProcessor in order to determine how many blocks, threads and streams to start.
It would be cool if cupla could emulate the device properties to some extent for this kind of use-case by mapping it onto corresponding features. E.g. maxThreadsPerMultiProcessor = OMP_NUM_CORES, multiProcessorCount=1;