Direct buffer passing

Hey yall I just found out about this project a few days ago, very cool stuff. Currently evaluating this for use in my team at Anduril. One of the main blockers tho is the lack of a easy way to directly pass in device buffers to kernels (I'm sure theres some not so straightforward ways) . There are several reasons to want to do this:

- You are using cubecl in conjunction with some other library, tool, langauge, etc that produces arrays/tensors that are already on the GPU. You'll have no choice but to copy it back to the host, and then have the runtime copy it back in order to call a kernel on it. 
- You want to use unified / pinned memory, GPU RDMA, etc for performance reasons. 

I've sketched out an implementation [here](https://github.com/tracel-ai/cubecl/compare/main...leizaf:cubecl:direct-buffer-passing) and I'm looking for feedback before I get too deep. I've basically just added an `ArrayArg` variant that takes the `Resource` of the associated `ComputeStorage`.  In the end you'll be able to mix it with normal cubecl arrays and it'll look something like this:
```rust
let device_buffer = CudaResource::from_device_pointer(...);

some_kernel::launch_unchecked::<...>(
    &client,
    CubeCount::Static(...),
    CubeDim::new(...),
    ArrayArg::from_raw_resource(device_buffer),
    ArrayArg::from_raw_parts(&normal_handle, ...),
);
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct buffer passing #291

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Direct buffer passing #291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions