Multi-GPU setup with PyTorch #1691
-
|
Dear all, I am currently looking for a solution to utilize multiple GPUs by distributing them between workloads. I found this discussion: mitsuba-renderer/drjit#359. But using the Let's say, Mitsuba takes one GPU, and the model (we don't train this) takes two other GPUs. I would need to transfer the gradients from the other two GPUs to the one with Mitsuba. What would be the best way in a cluster with four GPUs to use Mitsuba in an optimization pipeline using a larger model that does not fit on the same GPU? Thanks a lot, I have come up with the following code using multiprocessing and IPC. Unfortunately, it ends up in: RuntimeError: drjit.backward_from(): the argument does not depend on the input variable(s) being differentiated.
Raising an exception since this is usually indicative of a bug (for example, you may have forgotten to call dr.enable_grad(..)).
If this is expected behavior, provide the drjit.ADFlag.AllowNoGrad flag to the function (e.g., by specifying flags=dr.ADFlag.Default | dr.ADFlag.AllowNoGrad).What am I doing wrong? This is the example: import multiprocessing as mp
def mitsuba_worker(render_request, render_response, grad_queue, gt_queue):
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import mitsuba as mi
import drjit as dr
mi.set_variant("cuda_ad_rgb")
# Build scene
scene = mi.load_dict(mi.cornell_box())
params = mi.traverse(scene)
gt = mi.render(scene)
gt_queue.put(gt.numpy())
optim = mi.ad.Adam(0.1)
col = mi.Color3f(0.205421, 0.47798, 0.176425)
dr.enable_grad(col)
optim["green.reflectance.value"] = col
params.update(optim)
for step in range(10):
msg = render_request.get()
if msg == "render":
image = mi.render(scene, spp=8)
render_response.put(image.numpy()) # Send rendered image
# Receive ∂L/∂image from PyTorch
grad_image = grad_queue.get()
dr.set_grad(image, grad_image)
dr.backward(image) # or dr.backward() doesn't change anything
optim.step()
optim["green.reflectance.value"] = dr.clip(
optim["green.reflectance.value"], 0.0, 1.0
)
params.update(optim)
print(f"[Mitsuba] Step {step}")
def pytorch_worker(render_request, render_response, grad_queue, gt_queue):
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import torch
import torch.nn.functional as F
target = torch.tensor(gt_queue.get())
for step in range(10):
render_request.put("render")
image_np = render_response.get()
image = torch.tensor(image_np, requires_grad=True)
loss = F.mse_loss(image, target)
loss.backward()
# Send ∂L/∂image to Mitsuba
grad = image.grad
grad_queue.put(grad)
print(f"[PyTorch] Step {step}, Loss = {loss.item():.6f}")
if __name__ == "__main__":
mp.set_start_method("spawn")
gt_queue = mp.Queue()
render_request = mp.Queue()
render_response = mp.Queue()
grad_queue = mp.Queue()
p_mi = mp.Process(
target=mitsuba_worker,
args=(render_request, render_response, grad_queue, gt_queue),
)
p_torch = mp.Process(
target=pytorch_worker,
args=(render_request, render_response, grad_queue, gt_queue),
)
p_mi.start()
p_torch.start()
p_mi.join()
p_torch.join() |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
|
I think your issue is in the Back to your initial question, we don't plan to support this type of multi-GPU workloads. We don't really have a need for it ourselves and we don't have the infrastructure to actually test it. So I don't really have any good guidelines or tips to share on this topic. |
Beta Was this translation helpful? Give feedback.
Hi @njroussel,
I did that as well and have been playing with it for some time, and eventually got something working this morning. Do you think the "right way" to do this is using Dr.Jit?