- 
                Notifications
    
You must be signed in to change notification settings  - Fork 140
 
[REVIEW][Java] One PinnedMemoryBuffer per CuVSResourcesImpl #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| try (var localArena = Arena.ofConfined()) { | ||
| MemorySegment pointer = localArena.allocate(C_POINTER); | ||
| checkCudaError(cudaMallocHost(pointer, bufferBytes), "cudaMallocHost"); | ||
| checkCudaError(cudaMallocHost(pointer, PinnedMemoryBuffer.CHUNK_BYTES), "cudaMallocHost"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong.
There were a couple of advantages to the previous approach that now seem to be missing:
- If the total number of bytes did not exceed 
CHUNK_BYTES, we would make a smaller allocation. By removing this check, we're guaranteeing 8MB allocation per thread just for pinned memory. Is that wise? - There was protection against a very large row-size (8MB). Now we ignore the possibility. Wouldn't that mean a buffer-overrun when a single row is copied? Or is that not deemed possible now?
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the total number of bytes did not exceed CHUNK_BYTES, we would make a smaller allocation. By removing this check, we're guaranteeing 8MB allocation per thread just for pinned memory. Is that wise?
We discussed this with @achirkin and think this is worth it. It's true we are allocating 8MB per-resource, but it's also true we do this on-demand, and the benefit of not having to re-allocate every time (which is costly, due to the lock on CUDA context) is worth it.
There was protection against a very large row-size (8MB). Now we ignore the possibility. Wouldn't that mean a buffer-overrun when a single row is copied? Or is that not deemed possible now?
It's true. It's very unlikely, I would say impossible in our scenarios, but it's something missing. Let me add this protection back (although it will need to be in a different place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👍 on the change so far.
It's very unlikely, I would say impossible in our scenarios, but it's something missing. Let me add this protection back...
I'll take another look when this changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please :) I'll ping you when I'll push the change!
        
          
                java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSResourcesImpl.java
          
            Show resolved
            Hide resolved
        
              
          
                java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/ScopedAccessWithHostBuffer.java
          
            Show resolved
            Hide resolved
        
      | 
           /ok to test aa5e469  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| 
           I've rebased this PR and resolved a conflict.  | 
    
| 
           /ok to test d7fcdb3  | 
    
| 
           /ok to test c4f0570  | 
    
| 
           /ok to test d792fce  | 
    
| 
           @mythrocks the copyright issue should be fixed with c4f0570 (but CI is still failing here on   | 
    
While profiling cuvs-java, we found that allocating a
PinnedMemoryBufferfor each host->device or device->host memory copy was unnecessary and wasteful.This PR moves the allocation of a
PinnedMemoryBuffertoCuVSResourcesImpl, so that the buffer can be cached and reused. SinceCuVSResourcesare already meant to be per-thread, this is safe, as thePinnedMemoryBufferwill never be used concurrently.In order to do it cleanly, we introduced two named
ScopedAccessclasses and a helper method that will always find its way to the internalMemorySegmentused by native functions to access the buffer, without the need to expose it via the public interface.