Eval Microservice

Warning

This is an experimental service that does not have persistent storage of state or data. All memory is lost upon restart.

Eval Microservice

A FastAPI-based microservice for creating and managing model evaluations. This service provides APIs to create evaluations, configure evaluation runs, and process them using various grading mechanisms.

Features

Evaluation Management: Create, read, update, and delete evaluations
Evaluation Runs: Execute evaluations with configurable data sources and models
Background Processing: Asynchronous evaluation run processing
Multiple Graders: Support for different grading mechanisms
MCP Integration: Model Context Protocol support for agent-based evaluations
OpenAI Integration: Support for OpenAI-compatible LLM endpoints

Project Structure

app/
├── api/v1/           # API endpoints
├── schemas/          # Pydantic models and schemas
├── services/         # Business logic and services
├── utils.py          # Utility functions
└── main.py          # FastAPI application entry point

Setup

Prerequisites

Python 3.12+
Virtual environment (recommended)

Installation

Clone the repository and navigate to the project directory:

git clone https://github.com/menloresearch/jan-incub-evals.git
cd jan-incub-evals

Create and activate a virtual environment:

Generally we recommend using uv to manage virtual environments and install packages.

uv venv --python=3.12 --managed-python
source .venv/bin/activate

Setup pre-commit (optional)

uv pip install pre-commit
pre-commit install  # install pre-commit hooks

pre-commit  # manually run pre-commit
pre-commit run --all-files  # if you forgot to install pre-commit previously

Install dependencies:

uv pip install -r requirements.txt

Environment Configuration

Create a .env file in the app/ directory with the following variables:

LLM_BASE_URL=your_llm_endpoint_url
LLM_API_KEY=your_llm_api_key

Running the Service

Development Mode

source .venv/bin/activate
python -m uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload

Production Mode

source .venv/bin/activate
python -m uvicorn app.main:app --host 0.0.0.0 --port 8001

The service will be available at http://localhost:8001

API Documentation

Once the service is running, you can access:

Swagger UI: http://localhost:8001/docs
ReDoc: http://localhost:8001/redoc

Main Endpoints

POST /v1/evals - Create a new evaluation
GET /v1/evals/{eval_id} - Retrieve an evaluation
POST /v1/evals/{eval_id} - Update an evaluation
DELETE /v1/evals/{eval_id} - Delete an evaluation
POST /v1/evals/{eval_id}/runs - Create and start an evaluation run
GET /v1/evals/{eval_id}/runs/{run_id} - Get evaluation run status

Testing

Run tests using pytest:

pytest

Architecture

Core Components

EvalService: Main service for managing evaluations and runs
EvalProcessor: Handles background processing of evaluation runs
Graders: Various grading mechanisms for evaluation results
Agent: MCP-based agent integration for complex evaluations
Schemas: Pydantic models for data validation and serialization

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Eval Microservice

Features

Project Structure

Setup

Prerequisites

Installation

Environment Configuration

Running the Service

Development Mode

Production Mode

API Documentation

Main Endpoints

Testing

Architecture

Core Components

About

Uh oh!

Releases

Packages

Languages

menloresearch/jan-incub-evals

Folders and files

Latest commit

History

Repository files navigation

Eval Microservice

Features

Project Structure

Setup

Prerequisites

Installation

Environment Configuration

Running the Service

Development Mode

Production Mode

API Documentation

Main Endpoints

Testing

Architecture

Core Components

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages