Introduce a `mark_sharding` that also shards the backward

## 🚀 Feature

PyTorch/XLA `xs.mark_sharding` is an in-place operation that adds sharding annotation to an XLA tensor. However, gradients to be applied to the tensor are not annotated with sharding annotations.

## Motivation

In some cases, GSPMD fails to propagate sharding annotation from the tensor to its gradient. It's useful to shard both tensor and its gradient with the same sharding annotation.

## Pitch

We could write a `torch.autograd.Function` implementation to do this.

## Additional context

JAX `mark_sharding` shards the gradients too.

cc @bhavya01 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a `mark_sharding` that also shards the backward #8678

🚀 Feature

Motivation

Pitch

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introduce a mark_sharding that also shards the backward #8678

Description

🚀 Feature

Motivation

Pitch

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Introduce a `mark_sharding` that also shards the backward #8678