add deepspeed support by WayneDW · Pull Request #2 · chinsengi/dUltra-os

WayneDW · 2026-02-12T04:57:18Z

The code runs locally. Distributed setups require significant hyperparameter changes. Tuning is ongoing.

chinsengi · 2026-02-12T21:16:13Z

                            unmasking_prob[batch], dtype=torch.float32
                        ).unsqueeze(0)
+                        EPS = 1e-6
+                        clamped_prob = torch.clamp(unmasking_prob[batch], min=EPS, max=1.0 - EPS)


I would advise against this clamping operation, since it will make likelihood estimation inaccurate.

chinsengi · 2026-02-12T21:21:55Z

            # If the model does not accept loss kwargs, we need to normalize the loss by the number of gradient accumulation steps
            loss = loss / self.current_gradient_accumulation_steps

-        # Turning off loss scaling w.r.t. gradient accumulation when DeepSpeed is enabled


This is deleted for no obvious reason.

chinsengi requested changes Feb 12, 2026

View reviewed changes

add deepspeed support, fine-tuned results

97b85ef

WayneDW force-pushed the deepspeed_support branch from 2961983 to 97b85ef Compare April 4, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add deepspeed support#2

add deepspeed support#2
WayneDW wants to merge 1 commit intochinsengi:mainfrom
WayneDW:deepspeed_support

WayneDW commented Feb 12, 2026

Uh oh!

chinsengi Feb 12, 2026

Uh oh!

chinsengi Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WayneDW commented Feb 12, 2026

Uh oh!

chinsengi Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

chinsengi Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants