Update Evaluation Logic to Latest lm_eval (0.4.8) and Support Automatic Benchmark Evals w/o Validation Set
          
            #190
        
      | Job | Run time | 
|---|---|
| 3m 8s | |
| 3m 8s |