Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic loading #11

Open
romankoblov opened this issue Jan 8, 2019 · 5 comments
Open

Dynamic loading #11

romankoblov opened this issue Jan 8, 2019 · 5 comments

Comments

@romankoblov
Copy link

romankoblov commented Jan 8, 2019

Hello! Do you have plans to add support for dynamic loading cuda library at runtime?
Something like rust-dlopen for example.

With current approach this crate will cause crash if CUDA libraries is not found,
instead of raising error to user (so it will be possible to fallback to cpu/opencl code).

Also, with dynamic loading it will be possible to find CUDA libraries at runtime,
so it won't be necessary to do
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib
in MacOS (and probably LD_LIBRARY_PATH in Linux).

@LutzCle
Copy link
Contributor

LutzCle commented Jan 9, 2019

Dynamic loading by itself won't help, because some CUDA versions break binary compatibility (see termoshtt/accel#58 and the follow-ups in #4 and #12). In contrast, dynamic loading requires binary compatibility.

Do you have any pointers to other projects that implement dynamic loading for CUDA?

@romankoblov
Copy link
Author

There is ongoing work in TensorFlow for example.
Also, there is a bit different approach in ArrayFire: they dynamically loading backend libraries (which linked to CUDA), but as for rust applications this will reduce safety. And also it is better to do it once, than for every app that requires optional CUDA support.

In terms of version compatibility I don't see much differences here, you still can check version of
library after dynamic loading and provide correct API for it.

@LutzCle
Copy link
Contributor

LutzCle commented Jan 9, 2019

Sorry, I misunderstood your first post. This is orthogonal to the version incompatibility issue.

Looking through tensorflow/tensorflow@f092c9d, what they're doing is creating a shim for each and every CUDA function. This makes it possible to switch between CUDA and the shim at runtime.

Not so sure if cuda-sys is the right place to do these kind of tricks, as they introduce some new trade-offs and the purpose of this crate is to provide bare-bones CUDA bindings. Off the top of my head:

  • Additional runtime overhead, because we would need a branch to test the is-CUDA-present? condition on each function call.
  • Code complexity, because the shims need to be written and maintained.

Perhaps someone with more experience than me could pitch in?

@AndrewGaspar
Copy link
Contributor

I agree that this seems out of scope for cuda-sys.

What you could do, if you'd like to take the shim approach, is create a separate shim library that, for all intents and purposes, looks like the CUDA library, but does the necessary indirection described here. cuda-sys should be able to, once #4 is merged, automatically pick that up and use it as long as you set CUDA_LIBRARY_PATH correctly.

@romankoblov
Copy link
Author

It can also make incompatibility issues easier, since you can choose function signature at runtime.
There is examples of sys crates with dynamic loading:

As you can see static pointers to library functions set, so no need for branch at each function call.
Code will be more complex than "just use bindgen", but as for clang-sys example there is not much difference between raw bindings vs runtime loading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants