Llemval is a tool that helps you compare LLM outputs for your specific needs. Set up an "experiment" instructing how an LLM should behave and write testcases with prompts and how you'd expect the LLM to output.
Experiment log feature is still under construction - will update soon.
Tools used: React, SQLAlchemy, Groq