-
Notifications
You must be signed in to change notification settings - Fork 499
Issues: pytorch/xla
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Missing test for SPMD with scalar parameter.
SPMD / Distributed
testing
Testing and coverage related issues.
#8705
opened Feb 13, 2025 by
ysiraichi
Enable 2D shardign with New feature or request
SPMD / Distributed
minibatch=True
for SPMD
enhancement
#8696
opened Feb 10, 2025 by
miladm
[torch_xla] scan only captures aten operations
dynamo
enhancement
New feature or request
SPMD / Distributed
#8691
opened Feb 9, 2025 by
tengyifei
Introduce a New feature or request
SPMD / Distributed
mark_sharding
that also shards the backward
enhancement
#8678
opened Feb 4, 2025 by
tengyifei
Missing sharding specs when annotating sharding over views
bug
Something isn't working
SPMD / Distributed
#8662
opened Feb 1, 2025 by
rpsilva-aws
Torch XLA Model all_gather does not work with tensors of different sizes along dimension 0
enhancement
New feature or request
SPMD / Distributed
usability
Bugs/features related to improving the usability of PyTorch/XLA
#8660
opened Jan 31, 2025 by
ajayvohra2005
xla_backend make_send_channel_id NotImplementedError
bug
Something isn't working
SPMD / Distributed
#8594
opened Jan 21, 2025 by
radna0
issues with v6e tpu deployment
bug
Something isn't working
SPMD / Distributed
xla:tpu
TPU specific issues and PRs
#8591
opened Jan 20, 2025 by
ttdd11
Kaggle TPU Multi-core Training Crashes with Something isn't working
SPMD / Distributed
xla:tpu
TPU specific issues and PRs
debug_single_process=False
bug
#8569
opened Jan 14, 2025 by
mohamedamara7
Multigpu training hangs using single and multiple nodes
bug
Something isn't working
needs reproduction
pytorch api
XLA behavior doesn't match Pytorch eager frontend
SPMD / Distributed
#8549
opened Jan 10, 2025 by
Patataman
GPT-2 OOM when using more than 4 attention blocks
SPMD / Distributed
#7791
opened Jul 31, 2024 by
miladm
Support Bugs/features related to improving the usability of PyTorch/XLA
dist.ReduceOp.AVG
on XLA device
SPMD / Distributed
usability
#7782
opened Jul 31, 2024 by
miladm
Spmd pre-training llama2 multi-machine training so slow?
SPMD / Distributed
xla:gpu
#6778
opened Mar 20, 2024 by
mars1248
How to minimize memory expansion due to padding during sharding
SPMD / Distributed
#6674
opened Mar 6, 2024 by
mfatih7
SPMD Global Batch size vs. --per_device_train_batch_size
SPMD / Distributed
#6411
opened Jan 30, 2024 by
isaacr
Support list of ShardingSpec in MpDeviceLoader
backport_2.2
DO_NOT_MERGE_YET
For PRs which cannot be merged, despite tests passing
SPMD / Distributed
#5789
opened Nov 10, 2023 by
jonb377
Loading…
[RFC] A high-level GSPMD API in PT/XLA (based on New feature or request
nostale
Do not consider for staleness
RFC
SPMD / Distributed
xs.mark_sharding
)
enhancement
#3755
opened Jul 23, 2022 by
ronghanghu
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.