Skip to content

Issues: deepspeedai/DeepSpeed

[Roadmap] DeepSpeed Roadmap Q1 2025
#6946 opened Jan 13, 2025 by loadams
Open
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[BUG] mpi based training error bug Something isn't working training
#6997 opened Feb 4, 2025 by cyr0930
[BUG] loading model error bug Something isn't working training
#6994 opened Feb 3, 2025 by tengwang0318
[BUG] Invalidate trace cache warning bug Something isn't working training
#6985 opened Jan 30, 2025 by leachim
[BUG] pdsh runner doesn't work with tqdm bar bug Something isn't working training
#6978 opened Jan 29, 2025 by Superskyyy
[BUG] libaio on amd node bug Something isn't working training
#6972 opened Jan 25, 2025 by GuanhuaWang
[BUG] z3+compile+gradient checkpoint uses more memory bug Something isn't working training
#6966 opened Jan 22, 2025 by oraluben
[BUG] model(**input) cannot use under zero stage 3. bug Something isn't working training
#6949 opened Jan 14, 2025 by MarkDeng1
[BUG]Zero++ training failed bug Something isn't working training
#6926 opened Jan 6, 2025 by HelloWorld506
Using zero3 on multiple nodes is slow bug Something isn't working training
#6889 opened Dec 18, 2024 by HelloWorld506
DeepSpeed with trl bug Something isn't working training
#6852 opened Dec 11, 2024 by sagie-dekel
[BUG] max_grad_norm not effect bug Something isn't working training
#6743 opened Nov 12, 2024 by yiyepiaoling0715
ProTip! Updated in the last three days: updated:>2025-02-04.