Skip to content

Commit 6dd9c16

Browse files
authored
Tweaks to models grading (#858)
1 parent f0c9a34 commit 6dd9c16

3 files changed

Lines changed: 25 additions & 25 deletions

File tree

website/docs/components/models/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Spice supports various model providers for traditional machine learning (ML) mod
2525
[ant]: ./anthropic.md
2626
[xai]: ./xai.md
2727

28-
Spice also tests and evaluates common models and grades their ability to integrate with Spice. See the [Models Grade Report](./report.md).
28+
Spice also tests and evaluates common models and grades their ability to integrate with Spice. See the [Models Grade Report](/docs/reference/models.md).
2929

3030
\*LLM Format(s) may require additional files (e.g., `tokenizer_config.json`).
3131

website/docs/components/models/report.md

Lines changed: 0 additions & 24 deletions
This file was deleted.

website/docs/reference/models.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: 'Models Grade Report'
3+
description: 'Spice AI graded Large-Language-Model (LLM) evaluation report'
4+
sidebar_label: 'Report'
5+
sidebar_position: 4
6+
---
7+
8+
This document presents the evaluation report for various Large-Language-Models (LLMs) graded by Spice AI. The models are assessed based on their basic capabilities, quality of tool calls, and accuracy of output when integrated with Spice.
9+
10+
For more details on how model grades are evaluated in Spice, refer to the [model grading criteria](https://github.com/spiceai/spiceai/blob/f6039123028209e20469b342791fa85d52b7771e/docs/criteria/models/grading.md).
11+
12+
| Model | Spice Grade | Model Provider | Context Window<br/ >Max Output Tokens | Chat Completion | Response Format<br />(Structued Outputs) | Tools | Recursive<br />Tool Calling | Reasoning | Streaming | Evaluation Date | Spice Version |
13+
| ----------------------------------------------- | ----------- | -------------- | ------------------------------------- | --------------- | ---------------------------------------- | ----- | --------------------------- | --------- | --------- | --------------- | ------------- |
14+
| `o3-mini-2025-01-31 (Reasoning effort: high)` | **A** | `openai` | 200k tokens<br/ >100k tokens ||||||| 2025-01-31 | v1.0.2 |
15+
| `o3-mini-2025-01-31 (Reasoning effort: medium)` | **B** | `openai` | 200k tokens<br/ >100k tokens ||||||| 2025-01-31 | v1.0.2 |
16+
| `o3-mini-2025-01-31 (Reasoning effort: low)` | **C** | `openai` | 200k tokens<br/ >100k tokens ||||||| 2025-01-31 | v1.0.2 |
17+
| `o1-2024-12-17 (Reasoning effort: high)` | **A** | `openai` | 200k tokens<br/ >100k tokens ||||||| 2024-12-17 | v1.0.2 |
18+
| `o1-2024-12-17 (Reasoning effort: medium)` | **A** | `openai` | 200k tokens<br/ >100k tokens ||||||| 2024-12-17 | v1.0.2 |
19+
| `o1-2024-12-17 (Reasoning effort: low)` | **C** | `openai` | 200k tokens<br/ >100k tokens ||||||| 2024-12-17 | v1.0.2 |
20+
| `gpt-4o-2024-08-06` | **B** | `openai` | 128k tokens<br/ >16384 tokens ||||||| 2024-08-06 | v1.0.2 |
21+
| `claude-3-5-sonnet-20241022` | **C** | `anthropic` | 200k tokens<br/ >8192 tokens ||||||| 2024-10-22 | v1.0.2 |
22+
| `grok-2-1212` | Ungraded | `xai` |||||||| Not Available | v1.0.2 |
23+
| `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Ungraded | `huggingface` |||||||| Not Available | v1.0.2 |
24+
| `meta-llama/Llama-3.2-3B-Instruct` | Ungraded | `huggingface` |||||||| Not Available | v1.0.2 |

0 commit comments

Comments
 (0)