sahil2801/replit-code-instruct-glaive result #4
regularfry
started this conversation in
General
Replies: 2 comments
-
Have asked myself the same question and ran some validation experiments. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hm. That's a pain. These things happen. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Am I reading this right? Right now, https://huggingface.co/sahil2801/replit-code-instruct-glaive, a 3B model, is claiming a pass@1 score of 63.5%, significantly better than the next best model, which is a 15B model.
Is that... real? Or are we seeing an artefact of how the pass@1 test works?
Apologies if this is a stupid question, but that result just looks too good to be true.
Beta Was this translation helpful? Give feedback.
All reactions