Skip to content

Issues: deepspeedai/DeepSpeedExamples

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Recommended base docker image, torch/cuda version to use that is compatible with this code base? deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#546 opened May 23, 2023 by abdulvirta
Much more memory used in step 3 when using multi gpus compared to using single gpu deespeed chat DeepSpeed Chat llama Questions related to llama model system An issue with a environment/system setup.
#529 opened May 16, 2023 by cokuehuang
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling 'cublasCreate(handle)' deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#499 opened May 8, 2023 by Arain-sh
A100 40 GB: OOM on step-3 for opt-6.7B deespeed chat DeepSpeed Chat new-config A modified config from the given example system An issue with a environment/system setup.
#482 opened May 5, 2023 by akashsaravanan-georgian
unable to load 4 7b size model in step3 deespeed chat DeepSpeed Chat new-config A modified config from the given example system An issue with a environment/system setup.
#480 opened May 5, 2023 by Mr-lonely0
Can not use bloom-560m model in the step2_reward_model_finetuning deespeed chat DeepSpeed Chat new-config A modified config from the given example system An issue with a environment/system setup.
#479 opened May 5, 2023 by korlin0110
[BUG]RuntimeError: CUDA error: unknown error deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#453 opened Apr 28, 2023 by SH0AN
How to train deepspeed-chat using nccl with multi-nodes? deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#443 opened Apr 27, 2023 by SefaZeng
the DeepSpeed-Chat demo train.py cannot even run deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#432 opened Apr 25, 2023 by Emerald01
[ERROR] [launch.py:434:sigkill_handler] deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#430 opened Apr 25, 2023 by TheGravityZero
Step 3 1.3b Running process stuck deespeed chat DeepSpeed Chat new-config A modified config from the given example system An issue with a environment/system setup.
#428 opened Apr 25, 2023 by awelldone
Default configuration running with V100-32G causes OOM deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#387 opened Apr 21, 2023 by binderwang
Run multi-node training failed, how to train without hostfile deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#381 opened Apr 21, 2023 by xiaoyi0814
Step1 training failed bug Something isn't working deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#328 opened Apr 17, 2023 by omoiji
Running multinode training and received unclear error for stage 2 training bug Something isn't working deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#327 opened Apr 17, 2023 by alibabadoufu
[BUG]Step1 RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) bug Something isn't working deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#323 opened Apr 17, 2023 by qinqinqaq
Single node multi card training failed bug Something isn't working deespeed chat DeepSpeed Chat system An issue with a environment/system setup.
#310 opened Apr 15, 2023 by menkeyi
ProTip! Find all open issues with in progress development work with linked:pr.