The GEval API provides a method for evaluating model-generated outputs using the GEval framework. It assesses how well the model's output matches expected results based on custom criteria, evaluation steps, and other optional parameters.
G-Eval paper: https://arxiv.org/abs/2303.16634
G-Eval Python Implementation: https://github.com/confident-ai/deepeval
POST /api/geval
This endpoint evaluates a test case based on input, actual output, criteria, and optional evaluation steps, context, and retrieval context. It uses OpenAI models to calculate a score for the evaluation and provides an explanation for the assigned score.
The API expects a JSON payload with the following fields:
- name: (String) The name of the evaluation or test case. Example:
"order_relevance". - input: (String) The input or question given to the model. Example:
"Python course roadmap for beginners first module". - actualOutput: (String) The actual output generated by the model. Example:
"- module 1: python basics".
- criteria: (String) (Optional) The criteria based on which the output is evaluated. Example:
"check if the course has the correct order for the intended audience". - expectedOutput: (String) (Optional) The expected output for comparison. Example:
"module 1: python basics". - evaluationSteps: (Array of Strings) (Optional) A list of step-by-step evaluation criteria. Example:
["Verify the order of modules", "Check if topics follow a logical progression for beginners"]. - context: (String) (Optional) Additional context or background information. Example:
"Python is a fundamental programming language, and a roadmap for beginners should start with basics.". - retrievalContext: (String) (Optional) Additional information from retrieval context. Example:
"Python is often taught with a clear progression from basics to advanced topics.".
curl -X POST http://localhost:3001/api/geval \
-H "Content-Type: application/json" \
-d '{
"name": "order_relevance",
"input": "Python course roadmap for beginners first module",
"actualOutput": "- module 1: python basics",
"criteria": "check if the course has correct order for the required audience",
"evaluationSteps": [
"Ensure the order of topics follows a logical progression",
"Check if the content is appropriate for beginners"
]
}'curl -X POST http://localhost:3001/api/geval \
-H "Content-Type: application/json" \
-d '{
"name": "output_accuracy",
"input": "What is the capital of France?",
"actualOutput": "The capital of France is Paris.",
"expectedOutput": "Paris",
"evaluationSteps": [
"Verify if the actual output matches the expected output",
"Check if the response provides accurate information"
],
"context": "Paris is the capital of France."
}'The API will respond with a JSON object containing the evaluation results. The fields include:
- score: (Number) The evaluation score, ranging from 0 to 1.
- reason: (String) A concise explanation for the assigned score.
{
"score": 0.9,
"reason": "The output is relevant and follows the expected order for beginners."
}| Parameter | Type | Required | Description |
|---|---|---|---|
| name | String | Yes | Name of the evaluation or test case. Example: "order_relevance". |
| input | String | Yes | The input given to the model. Example: "Python course roadmap for beginners first module". |
| actualOutput | String | Yes | The actual output generated by the model. Example: "- module 1: python basics". |
| criteria | String | No | Specific criteria for evaluation. Example: "check if the course has correct order for beginners". |
| expectedOutput | String | No | The expected output for comparison. Example: "module 1: python basics". |
| evaluationSteps | Array of Strings | No | Step-by-step criteria for evaluation. Example: ["Verify the order of modules", "Check progression"]. |
| context | String | No | Additional background information for evaluation. Example: "Python is a fundamental language". |
| retrievalContext | String | No | Retrieval context information. Example: "Python is often taught progressively from basics". |
| Field | Type | Description |
|---|---|---|
| score | Number | The final evaluation score, between 0 and 1. |
| reason | String | A concise explanation of the evaluation score and key observations. |
By using this API, you can easily automate and perform evaluations of model-generated outputs based on specific criteria and custom evaluation steps.
Founder of Pype, [email protected]
Licensed under the Apache License, Version 2.0; you may not use this file except in compliance with the License. You may obtain a copy of the License at (LICENSE. Portions of this project are derived from [Deepeval], available at [https://github.com/confident-ai/deepeval/], licensed under the Apache License, Version 2.0.