Create a benchmark test for any prompt updates #1136

joaomdmoura · 2024-01-08T18:45:04Z

joaomdmoura
Jan 8, 2024
Maintainer

I have some local scripts but would be nice to have it as a proper test something to meaure quality prompts get updated and translated into different language

nkeilar · 2024-04-03T01:16:51Z

nkeilar
Apr 3, 2024

@joaomdmoura I'm quite interested in this. After Andrew Ng's recent talk, it seems that the uncertainty about requiring more training data to get better performance is moot, and its clearly possible to get significant improvements using agents. I'm feeling frustrated hacking away on crews, with no clear way to benchmark what sort of prompts work with which LLM's. I'm running a dual 3090 setup with a 14900KS, and have 4TB nvme for local models. I'm pushing to get DBRX quantized to 2bits (waiting on some things in llama.cpp). Once done we should have a local model better than GPT 3.5, so if we can figure out the prompt and agent architecture we should be able to achieve greater than GPT-4 locally. Which seems like it could be enough for most work....

... with that said I'm interested in finding and running the benchmarks locally. I'm happy to contribute the compute to keep a benchmark updated.

Any pointers would be appreciated and I'll see what I can get setup.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a benchmark test for any prompt updates #1136

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Create a benchmark test for any prompt updates #1136

Uh oh!

joaomdmoura Jan 8, 2024 Maintainer

Replies: 1 comment

Uh oh!

nkeilar Apr 3, 2024

joaomdmoura
Jan 8, 2024
Maintainer

nkeilar
Apr 3, 2024