Update Evaluation Logic to Latest lm_eval (0.4.8) and Support Automatic Benchmark Evals w/o Validation Set
          
            #186
        
      | Job | Run time | 
|---|---|
| 2m 52s | |
| 2m 52s |