Description
In the unified-memory example, the dataElem.cu file, the codes below:
// Copy up each piece separately, including new “text” pointer value
①cudaMemcpy(d_elem, elem, sizeof(DataElement), cudaMemcpyHostToDevice);
②cudaMemcpy(d_name, elem->name, namelen, cudaMemcpyHostToDevice);
③cudaMemcpy(&(d_elem->name), &d_name, sizeof(char*), cudaMemcpyHostToDevice);
// Finally we can launch our kernel, but CPU & GPU use different copies of “elem”
④Kernel<<< 1, 1 >>>(d_elem);
⑤cudaMemcpy(&(elem->value), &(d_elem->value), sizeof(int), cudaMemcpyDeviceToHost);
⑥cudaMemcpy(elem->name, d_name, namelen, cudaMemcpyDeviceToHost);
step② and ③,why didn't copy data from elem->name to d_elem->name? if i firstly copy data from elem->name to d_name, when step ③, why is the parameter cudaMemcpyHostToDevice, not cudaMemcpyDeviceToDevice?
and after the kernel executed. step ⑥, why did you copy data from d_name to elem->name, not d_elem->name to elem->name?