Skip to content

add deepspeed support#2

Open
WayneDW wants to merge 1 commit intochinsengi:mainfrom
WayneDW:deepspeed_support
Open

add deepspeed support#2
WayneDW wants to merge 1 commit intochinsengi:mainfrom
WayneDW:deepspeed_support

Conversation

@WayneDW
Copy link
Copy Markdown

@WayneDW WayneDW commented Feb 12, 2026

The code runs locally. Distributed setups require significant hyperparameter changes. Tuning is ongoing.

unmasking_prob[batch], dtype=torch.float32
).unsqueeze(0)
EPS = 1e-6
clamped_prob = torch.clamp(unmasking_prob[batch], min=EPS, max=1.0 - EPS)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise against this clamping operation, since it will make likelihood estimation inaccurate.

# If the model does not accept loss kwargs, we need to normalize the loss by the number of gradient accumulation steps
loss = loss / self.current_gradient_accumulation_steps

# Turning off loss scaling w.r.t. gradient accumulation when DeepSpeed is enabled
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deleted for no obvious reason.

@WayneDW WayneDW force-pushed the deepspeed_support branch from 2961983 to 97b85ef Compare April 4, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants