Tweaks to models grading (#858)

lukekim · web-flow · commit 6dd9c16ccfc9 · 2025-02-10T10:40:19.000-08:00
diff --git a/website/docs/components/models/index.md b/website/docs/components/models/index.md
@@ -25,7 +25,7 @@ Spice supports various model providers for traditional machine learning (ML) mod
 [ant]: ./anthropic.md
 [xai]: ./xai.md
 
-Spice also tests and evaluates common models and grades their ability to integrate with Spice. See the [Models Grade Report](./report.md).
+Spice also tests and evaluates common models and grades their ability to integrate with Spice. See the [Models Grade Report](/docs/reference/models.md).
 
 \*LLM Format(s) may require additional files (e.g., `tokenizer_config.json`).
 
diff --git a/website/docs/components/models/report.md b/website/docs/components/models/report.md
diff --git a/website/docs/reference/models.md b/website/docs/reference/models.md
@@ -0,0 +1,24 @@
+---
+title: 'Models Grade Report'
+description: 'Spice AI graded Large-Language-Model (LLM) evaluation report'
+sidebar_label: 'Report'
+sidebar_position: 4
+---
+
+This document presents the evaluation report for various Large-Language-Models (LLMs) graded by Spice AI. The models are assessed based on their basic capabilities, quality of tool calls, and accuracy of output when integrated with Spice.
+
+For more details on how model grades are evaluated in Spice, refer to the [model grading criteria](https://github.com/spiceai/spiceai/blob/f6039123028209e20469b342791fa85d52b7771e/docs/criteria/models/grading.md).
+
+| Model                                           | Spice Grade | Model Provider | Context Window<br/ >Max Output Tokens | Chat Completion | Response Format<br />(Structued Outputs) | Tools | Recursive<br />Tool Calling | Reasoning | Streaming | Evaluation Date | Spice Version |
+| ----------------------------------------------- | ----------- | -------------- | ------------------------------------- | --------------- | ---------------------------------------- | ----- | --------------------------- | --------- | --------- | --------------- | ------------- |
+| `o3-mini-2025-01-31 (Reasoning effort: high)`   | **A**       | `openai`       | 200k tokens<br/ >100k tokens          | ✅              | ✅                                       | ✅    | ✅                          | ✅        | ✅        | 2025-01-31      | v1.0.2        |
+| `o3-mini-2025-01-31 (Reasoning effort: medium)` | **B**       | `openai`       | 200k tokens<br/ >100k tokens          | ✅              | ✅                                       | ✅    | ✅                          | ✅        | ✅        | 2025-01-31      | v1.0.2        |
+| `o3-mini-2025-01-31 (Reasoning effort: low)`    | **C**       | `openai`       | 200k tokens<br/ >100k tokens          | ✅              | ✅                                       | ✅    | ✅                          | ✅        | ✅        | 2025-01-31      | v1.0.2        |
+| `o1-2024-12-17 (Reasoning effort: high)`        | **A**       | `openai`       | 200k tokens<br/ >100k tokens          | ✅              | ✅                                       | ✅    | ✅                          | ✅        | ✅        | 2024-12-17      | v1.0.2        |
+| `o1-2024-12-17 (Reasoning effort: medium)`      | **A**       | `openai`       | 200k tokens<br/ >100k tokens          | ✅              | ✅                                       | ✅    | ✅                          | ✅        | ✅        | 2024-12-17      | v1.0.2        |
+| `o1-2024-12-17 (Reasoning effort: low)`         | **C**       | `openai`       | 200k tokens<br/ >100k tokens          | ✅              | ✅                                       | ✅    | ✅                          | ✅        | ✅        | 2024-12-17      | v1.0.2        |
+| `gpt-4o-2024-08-06`                             | **B**       | `openai`       | 128k tokens<br/ >16384 tokens         | ✅              | ✅                                       | ✅    | ✅                          | ❌        | ✅        | 2024-08-06      | v1.0.2        |
+| `claude-3-5-sonnet-20241022`                    | **C**       | `anthropic`    | 200k tokens<br/ >8192 tokens          | ✅              | ❌                                       | ✅    | ✅                          | ❌        | ✅        | 2024-10-22      | v1.0.2        |
+| `grok-2-1212`                                   | Ungraded    | `xai`          | −                                     | ✅              | −                                        | −     | −                           | ❌        | −         | Not Available   | v1.0.2        |
+| `deepseek-ai/DeepSeek-R1-Distill-Llama-8B`      | Ungraded    | `huggingface`  | −                                     | ✅              | −                                        | −     | −                           | ✅        | −         | Not Available   | v1.0.2        |
+| `meta-llama/Llama-3.2-3B-Instruct`              | Ungraded    | `huggingface`  | −                                     | ✅              | −                                        | −     | −                           | ❌        | −         | Not Available   | v1.0.2        |