File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 11# Language Model Evaluations
22
3- Works with ` v1.0+ `
3+ Deprecated in ` v2.0+ ` . Works with ` v1.0+ `
44
55Spice can be used to both run language models but also to evaluate their performance on specific tasks.
66
@@ -24,12 +24,27 @@ spice run
24242 . Run an evaluation against ` my_model ` . This will take a moment to complete.
2525
2626``` shell
27- spice eval tetris --model my_model
27+ curl -XPOST " http://localhost:8090/v1/evals/tetris" \
28+ -H " Content-Type: application/json" \
29+ -d ' {
30+ "model": "my_model"
31+ }'
2832```
2933
30- ``` text
31- ID CREATEDAT DATASET MODEL STATUS SCORERS METRICS
32- 15b8c5351cff98d96db28b8c76ad19dc 2024-12-30T06:14:54 small_tetris my_model Completed [match] map[match/mean:0.375]
34+ ``` json
35+ [
36+ {
37+ "id" : " 15b8c5351cff98d96db28b8c76ad19dc" ,
38+ "created_at" : " 2024-12-30T06:14:54" ,
39+ "dataset" : " small_tetris" ,
40+ "model" : " my_model" ,
41+ "status" : " Completed" ,
42+ "scorers" : [" match" ],
43+ "metrics" : {
44+ "match/mean" : 0.375
45+ }
46+ }
47+ ]
3348```
3449
35503 . Inspect the results
You can’t perform that action at this time.
0 commit comments