Skip to content

Cuda interface mallocs and pybind11 #118

@SBresler

Description

@SBresler

I have gotten the cuda interface to work in python now, and I am seeing a lot of memory allocations when I call the dll. It's way faster than what I was doing before.

I was messing around with the Nvidia Performance Primitives, and many of these functions require a scratch buffer that is pre allocated - So I can just pass an already allocated array to the NPP dll, and then all of the memory is set up.

So what I am asking is the following:

Am I correct in assuming that the cuda interface does not have pointers for all of the things that it needs to do the calculation?

Is there theoretically a way to change the interface so that you can precompute the amount of "scratch buffer" needed for gpufit, and then eliminate some of the memory allocations?

looking at the traces I am getting about 50% of a call to gpufit's cuda interface as being memory allocations, while with the NPP stuff I never see any memory allocations... because i give it a preallocated scratch buffer.

I'm willing to look into this myself and contribute; it's just helpful to hear thoughts on this.

EDIT: oh, pybind11. Are there any performance advantages to using pybind11 instead of ctypes? I just used ctypes but I was interested in the idea that pybind11 requires you to modify cpp source, so I would assume the integration is a little bit tighter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions