Distributed Data Parallel #68

orlando-labs · 2025-11-21T14:58:34Z

The code and its behavior mostly mimic those of PyTorch. It differs only in the multiprocessing section, where the Ruby flow differs from the Python one.
All code has been tested in a multi-GPU environment, which is not currently reproducible on GitHub CI:

bundle exec rake compile test -- --with-torch-dir=/opt/libtorch --with-cuda-include=/usr/local/cuda-12.9/include --with-gloo-include=$(pwd)/vendor/gloo
...
415 runs, 955 assertions, 0 failures, 0 errors, 24 skips

We tried to maximize test coverage for every aspect of DDP communication. A benchmark and an example are also included.

Edge

It is often useful to access convolutional layer attributes, e.g. for output shapes precalculation.

attribute readers for ConvNd

…t_mask Fixed generation of square subsequent mask

…ad:map_locaion, torchrun, Tests are provided

ankane · 2025-11-24T21:25:46Z

Hi @orlando-labs, thanks for the PR.

I think most of this would be better as a separate gem for now, as I'm not in a position to support this functionality.

(also, there should already be a ModuleList class)

orlando-labs · 2025-11-25T04:22:01Z

Hi, @ankane. It's a really good idea to move this to a separate gem. However, some core functionality changes are needed to run DDP, such as improved device handling and mapping the location in Torch#load. These need to be implemented in the core gem.

ankane · 2025-11-26T18:53:08Z

Feel free to create individual PRs for those specific changes (for device handling, there's already a Device class).

ankane and others added 30 commits October 27, 2020 15:09

Updated LibTorch to 1.7.0

171f8ec

Added test for max change in 1.7.0 [skip ci]

30d02fe

Merge branch 'master' into libtorch-1.7.0

801d8c5

Updated changelog [skip ci]

d348c93

Removed deprecated overload for addcmul! and addcdiv!

47841b1

Updated readme [skip ci]

14b38d7

Merge pull request #1 from orlando-labs/edge

fce162e

Edge

Merge branch 'master' of https://github.com/ankane/torch.rb

e86311f

manual_seed and manual_seed_all CUDA methods added

80a601d

Unknown parameter in module error message fixed

0612de7

fixed CUDA random test for non-CUDA environment

5e117f0

removed useless commented out header

987a300

Merge branch 'master' of https://github.com/ankane/torch-rb

4eb8d64

named buffers load/save

cecf398

debug print removed

b20770c

Merge branch 'master' of https://github.com/ankane/torch-rb

d536335

Merge branch 'ankane:master' into master

2c8c1c8

Merge branch 'ankane:master' into master

477f08b

multihead attention

ef05f10

removed endless range for respecting dying ruby 2.6

d2edbab

module list

f9e9e86

Transformer: attention is all you need

898b4dd

Merge https://github.com/ankane/torch.rb

49ec070

Merge branch 'ankane:master' into master

77c152e

attribute readers for ConvNd

512962c

It is often useful to access convolutional layer attributes, e.g. for output shapes precalculation.

Merge pull request #2 from orlando-labs/attr_readers_for_conv_nd

8cfbd3b

attribute readers for ConvNd

Fixed generation of square subsequent mask

480baa8

Merge pull request #3 from orlando-labs/fix_generate_square_subsequen…

6038643

…t_mask Fixed generation of square subsequent mask

Merge remote-tracking branch 'upstream/master'

c8a7575

Merge branch 'ankane:master' into master

8fe2e6d

orlando-labs and others added 12 commits May 3, 2023 23:28

Merge branch 'ankane:master' into master

fa988dc

Merge branch 'ankane:master' into master

85aa9c3

Merge branch 'ankane:master' into master

27ebffd

distributed data parallel port with the supporting features: Torch#lo…

64be5d3

…ad:map_locaion, torchrun, Tests are provided

Updated distributed example

20e7845

inter-device map_location fixed

52282cb

autodetecting libtorch distributed support

cac2534

DDP fixes and improvements

d8dc295

possible fix for non-cuda c10

0286f2a

added missing const_cast

4276d63

skipping cuda tests when c10 for nccl in not available

51785c9

GLOO support

e63f784

orlando-labs mentioned this pull request Nov 21, 2025

does torch.rb support multi Gpu train with ddp and deploy TensorRT #66

Closed

ankane closed this Nov 26, 2025

orlando-labs mentioned this pull request Dec 15, 2025

Multi GPU support #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed Data Parallel #68

Distributed Data Parallel #68

Uh oh!

orlando-labs commented Nov 21, 2025

Uh oh!

ankane commented Nov 24, 2025

Uh oh!

orlando-labs commented Nov 25, 2025

Uh oh!

ankane commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Distributed Data Parallel #68

Distributed Data Parallel #68

Uh oh!

Conversation

orlando-labs commented Nov 21, 2025

Uh oh!

ankane commented Nov 24, 2025

Uh oh!

orlando-labs commented Nov 25, 2025

Uh oh!

ankane commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants