Thanks for the great work! What would be the best way to utilize F-eval for evaluating an arbitrary response from an arbitrary model? Can you provide the full pipeline?