The difference between DeepSeek-V3 and DeepSeek-R1-Zero #39
Unanswered
helperfunc
asked this question in
Q&A
Replies: 2 comments
-
Yes, the primary difference between DeepSeek-R1-Zero and DeepSeek-R1 lies in the use of supervised fine-tuning during post-training. DeepSeek-R1-Zero is trained exclusively using reinforcement learning without any supervised fine-tuning. In contrast, DeepSeek-R1 incorporates a supervised fine-tuning phase to enhance readability and coherence in its outputs. |
Beta Was this translation helpful? Give feedback.
0 replies
-
here are some sources: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Can we think the difference between DeepSeek-V3 and DeepSeek-R1-Zero is only with or without supervised fine-tuning during post-training?
Beta Was this translation helpful? Give feedback.
All reactions