Evals execution on arbitrary data

Thanks for the great work!

What would be the best way to utilize F-eval for evaluating an arbitrary response from an arbitrary model? Can you provide the full pipeline?