Implement own DDP used without CPU offloading (#100) by mseeger · Pull Request #100 · awslabs/keys_values

mseeger · 2026-05-01T21:12:43Z

Don't use DDP provided by Lightning Fabric. Instead, just do the all_reduce at the end of each iteration.
This is a major simplification, and results seem to get better.

Closes #88, #92, #98.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Implement own DDP used without CPU offloading (#100)

a621590

mseeger self-assigned this May 1, 2026

mseeger merged commit 1e2ec65 into main May 1, 2026
7 checks passed

mseeger deleted the own_ddp2 branch May 1, 2026 21:33

This was referenced May 1, 2026

Replace Lightning Fabric DDP by own solution (all_reduce on gradients) #98

Closed

GPU memory profiling: Still missing something? #92

Closed

Basic comparisons with baseline (and train versus eval) on 32k Helmet datasets #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement own DDP used without CPU offloading (#100)#100

Implement own DDP used without CPU offloading (#100)#100
mseeger merged 1 commit into
mainfrom
own_ddp2

mseeger commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mseeger commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant