Configurable gradient clipping to avoid exploding gradients #38

forklady42 · 2025-12-10T19:21:01Z

In an early training run on the original grid data, I ran into NaNs for losses.

After adding gradient logging locally, I discovered some major spikes in the gradients. This indicates that exploding gradients are likely the cause of the NaNs.

Indeed, after adding gradient clipping to avoid the large spikes, I ran 10 epochs on the original gridded data without any NaNs for losses. This will be another hyperparameter for us to tune.

I suspect smarter weight initialization, learning rate adjustments, and revisiting normalization will help stabilize the gradients, but in the meantime, gradient clipping will prevent them from spiraling out of control.

Note: I'm basing this PR on hananol/setup-ptlightning since this configuration change is being passed into the Lightning trainer. However, no need to block merging #26 on this PR. I can rebase onto main once the PyTorch Lightning refactor is merged.

hanaol

@forklady42, what gradient_clip_value did you use in your runs? I’ll add it to the configuration file and include it in this PR.

forklady42 · 2025-12-12T00:00:13Z

I used 1.0, same as the default here, but honestly, I think that's more aggressive than necessary. Let's start with 20.0. That will head off any particularly large gradients without being excessively restrictive.

I'll go ahead and add it to the MP config.

forklady42 requested a review from hanaol December 10, 2025 19:21

hanaol force-pushed the hanaol/setup-ptlightning branch from d02539c to d11a763 Compare December 11, 2025 17:38

forklady42 changed the base branch from hanaol/setup-ptlightning to main December 11, 2025 21:02

Configurable gradient clipping to avoid exploding gradients

0a77a1e

forklady42 force-pushed the betsy/grad-clip branch from 6dec1b7 to 0a77a1e Compare December 11, 2025 21:07

hanaol reviewed Dec 11, 2025

View reviewed changes

Add example gradient clip value to config

bba18f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable gradient clipping to avoid exploding gradients #38

Configurable gradient clipping to avoid exploding gradients #38

Uh oh!

forklady42 commented Dec 10, 2025

Uh oh!

hanaol left a comment

Uh oh!

forklady42 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Configurable gradient clipping to avoid exploding gradients #38

Are you sure you want to change the base?

Configurable gradient clipping to avoid exploding gradients #38

Uh oh!

Conversation

forklady42 commented Dec 10, 2025

Uh oh!

hanaol left a comment

Choose a reason for hiding this comment

Uh oh!

forklady42 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants