Skip to content

Conversation

@orlando-labs
Copy link
Contributor

The code and its behavior mostly mimic those of PyTorch. It differs only in the multiprocessing section, where the Ruby flow differs from the Python one.
All code has been tested in a multi-GPU environment, which is not currently reproducible on GitHub CI:

bundle exec rake compile test -- --with-torch-dir=/opt/libtorch --with-cuda-include=/usr/local/cuda-12.9/include --with-gloo-include=$(pwd)/vendor/gloo
...
415 runs, 955 assertions, 0 failures, 0 errors, 24 skips

We tried to maximize test coverage for every aspect of DDP communication. A benchmark and an example are also included.

ankane and others added 30 commits October 27, 2020 15:09
It is often useful to access convolutional layer attributes, e.g. for output shapes precalculation.
…t_mask

Fixed generation of square subsequent mask
@ankane
Copy link
Owner

ankane commented Nov 24, 2025

Hi @orlando-labs, thanks for the PR.

I think most of this would be better as a separate gem for now, as I'm not in a position to support this functionality.

(also, there should already be a ModuleList class)

@orlando-labs
Copy link
Contributor Author

Hi, @ankane. It's a really good idea to move this to a separate gem. However, some core functionality changes are needed to run DDP, such as improved device handling and mapping the location in Torch#load. These need to be implemented in the core gem.

@ankane
Copy link
Owner

ankane commented Nov 26, 2025

Feel free to create individual PRs for those specific changes (for device handling, there's already a Device class).

@ankane ankane closed this Nov 26, 2025
@orlando-labs orlando-labs mentioned this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants