-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: deepspeedai/DeepSpeedExamples
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Recommended base docker image, torch/cuda version to use that is compatible with this code base?
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#546
opened May 23, 2023 by
abdulvirta
Much more memory used in step 3 when using multi gpus compared to using single gpu
deespeed chat
DeepSpeed Chat
llama
Questions related to llama model
system
An issue with a environment/system setup.
#529
opened May 16, 2023 by
cokuehuang
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling 'cublasCreate(handle)'
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#499
opened May 8, 2023 by
Arain-sh
A100 40 GB: OOM on step-3 for opt-6.7B
deespeed chat
DeepSpeed Chat
new-config
A modified config from the given example
system
An issue with a environment/system setup.
#482
opened May 5, 2023 by
akashsaravanan-georgian
unable to load 4 7b size model in step3
deespeed chat
DeepSpeed Chat
new-config
A modified config from the given example
system
An issue with a environment/system setup.
#480
opened May 5, 2023 by
Mr-lonely0
Can not use bloom-560m model in the step2_reward_model_finetuning
deespeed chat
DeepSpeed Chat
new-config
A modified config from the given example
system
An issue with a environment/system setup.
#479
opened May 5, 2023 by
korlin0110
[BUG]RuntimeError: CUDA error: unknown error
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#453
opened Apr 28, 2023 by
SH0AN
How to train deepspeed-chat using nccl with multi-nodes?
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#443
opened Apr 27, 2023 by
SefaZeng
the DeepSpeed-Chat demo train.py cannot even run
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#432
opened Apr 25, 2023 by
Emerald01
[ERROR] [launch.py:434:sigkill_handler]
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#430
opened Apr 25, 2023 by
TheGravityZero
Step 3 1.3b Running process stuck
deespeed chat
DeepSpeed Chat
new-config
A modified config from the given example
system
An issue with a environment/system setup.
#428
opened Apr 25, 2023 by
awelldone
Default configuration running with V100-32G causes OOM
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#387
opened Apr 21, 2023 by
binderwang
Run multi-node training failed, how to train without hostfile
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#381
opened Apr 21, 2023 by
xiaoyi0814
Step1 training failed
bug
Something isn't working
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#328
opened Apr 17, 2023 by
omoiji
Running multinode training and received unclear error for stage 2 training
bug
Something isn't working
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#327
opened Apr 17, 2023 by
alibabadoufu
[BUG]Step1 RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling Something isn't working
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
cublasCreate(handle)
bug
#323
opened Apr 17, 2023 by
qinqinqaq
Single node multi card training failed
bug
Something isn't working
deespeed chat
DeepSpeed Chat
system
An issue with a environment/system setup.
#310
opened Apr 15, 2023 by
menkeyi
ProTip!
Find all open issues with in progress development work with linked:pr.