This is a follow-up to #4360.
Per the code review, the suggestion was made that cudaMemcpyAsync for host->device and device->host memory use pinned memory, for the three spots where cudaMemcpyAsync is called.
To reduce the churn for the 26.04 release, perhaps we should take this up for 26.06.