Gradient checkpointing fix #1043

michaelbenayoun · 2025-12-11T14:50:57Z

What does this PR do?

XLA-friendly gradient checkpointing function that accept keyword arguments.

HuggingFaceDocBuilderDev · 2025-12-11T14:55:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tengomucho · 2025-12-15T08:15:37Z

optimum/neuron/models/training/training_utils.py

+    from torch_xla.utils.checkpoint import checkpoint
+
+    fn_with_kwargs = partial(fn, **kwargs)
+    return checkpoint(fn_with_kwargs, *args)


I think this should be the other way around:

Suggested change

return checkpoint(fn_with_kwargs, *args)

return checkpoint(*args, fn_with_kwargs)

No, the first argument must be the function we checkpoint, check here.

tengomucho

why do you need this function? Can you provide a test that shows its usage and tests that it works?

michaelbenayoun · 2025-12-15T15:02:37Z

why do you need this function? Can you provide a test that shows its usage and tests that it works?

It is just a wrapper around the original checkpoint function. The goal is to make it possible to pass keyword arguments to the function we are wrapping with checkpoint.

I can add a test, but you mentioned many times that the CI was too long. For such a small change I was not considering it, please confirm you want a test.

tengomucho · 2025-12-15T15:30:53Z

optimum/neuron/models/training/llama/modeling_llama.py

            if self.gradient_checkpointing and self.training:
-                hidden_states = checkpoint(
-                    decoder_layer.__call__,
+                hidden_states = checkpoint_with_kwargs(


you change this here, only for granite, but not for the other models. I think you should change it everywhere, or nowhere, or explain why you only changed it here.

I dont undertsand? We change it "everywhere".

fix: gradient checkpointing with kwargs

95bd0e2

michaelbenayoun mentioned this pull request Dec 11, 2025

GRPO Trainer #1020

Draft

michaelbenayoun requested review from JingyaHuang, dacorvo and tengomucho December 12, 2025 13:05

tengomucho reviewed Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradient checkpointing fix #1043

Gradient checkpointing fix #1043

Uh oh!

michaelbenayoun commented Dec 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 11, 2025

Uh oh!

tengomucho Dec 15, 2025

Uh oh!

michaelbenayoun Dec 15, 2025

Uh oh!

tengomucho left a comment

Uh oh!

michaelbenayoun commented Dec 15, 2025 •

edited

Loading

Uh oh!

tengomucho Dec 15, 2025

Uh oh!

michaelbenayoun Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	return checkpoint(fn_with_kwargs, *args)
	return checkpoint(*args, fn_with_kwargs)

Gradient checkpointing fix #1043

Are you sure you want to change the base?

Gradient checkpointing fix #1043

Uh oh!

Conversation

michaelbenayoun commented Dec 11, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 11, 2025

Uh oh!

tengomucho Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

tengomucho left a comment

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tengomucho Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelbenayoun commented Dec 15, 2025 •

edited

Loading