Create a benchmark test for any prompt updates #1136
Replies: 1 comment
-
@joaomdmoura I'm quite interested in this. After Andrew Ng's recent talk, it seems that the uncertainty about requiring more training data to get better performance is moot, and its clearly possible to get significant improvements using agents. I'm feeling frustrated hacking away on crews, with no clear way to benchmark what sort of prompts work with which LLM's. I'm running a dual 3090 setup with a 14900KS, and have 4TB nvme for local models. I'm pushing to get DBRX quantized to 2bits (waiting on some things in llama.cpp). Once done we should have a local model better than GPT 3.5, so if we can figure out the prompt and agent architecture we should be able to achieve greater than GPT-4 locally. Which seems like it could be enough for most work.... ... with that said I'm interested in finding and running the benchmarks locally. I'm happy to contribute the compute to keep a benchmark updated. Any pointers would be appreciated and I'll see what I can get setup. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have some local scripts but would be nice to have it as a proper test something to meaure quality prompts get updated and translated into different language
Beta Was this translation helpful? Give feedback.
All reactions