Warning
This is an experimental service that does not have persistent storage of state or data. All memory is lost upon restart.
A FastAPI-based microservice for creating and managing model evaluations. This service provides APIs to create evaluations, configure evaluation runs, and process them using various grading mechanisms.
- Evaluation Management: Create, read, update, and delete evaluations
- Evaluation Runs: Execute evaluations with configurable data sources and models
- Background Processing: Asynchronous evaluation run processing
- Multiple Graders: Support for different grading mechanisms
- MCP Integration: Model Context Protocol support for agent-based evaluations
- OpenAI Integration: Support for OpenAI-compatible LLM endpoints
app/
├── api/v1/ # API endpoints
├── schemas/ # Pydantic models and schemas
├── services/ # Business logic and services
├── utils.py # Utility functions
└── main.py # FastAPI application entry point
- Python 3.12+
- Virtual environment (recommended)
- Clone the repository and navigate to the project directory:
git clone https://github.com/menloresearch/jan-incub-evals.git
cd jan-incub-evals
- Create and activate a virtual environment:
Generally we recommend using uv
to manage virtual environments and install packages.
uv venv --python=3.12 --managed-python
source .venv/bin/activate
- Setup pre-commit (optional)
uv pip install pre-commit
pre-commit install # install pre-commit hooks
pre-commit # manually run pre-commit
pre-commit run --all-files # if you forgot to install pre-commit previously
- Install dependencies:
uv pip install -r requirements.txt
Create a .env
file in the app/
directory with the following variables:
LLM_BASE_URL=your_llm_endpoint_url
LLM_API_KEY=your_llm_api_key
source .venv/bin/activate
python -m uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload
source .venv/bin/activate
python -m uvicorn app.main:app --host 0.0.0.0 --port 8001
The service will be available at http://localhost:8001
Once the service is running, you can access:
- Swagger UI:
http://localhost:8001/docs
- ReDoc:
http://localhost:8001/redoc
POST /v1/evals
- Create a new evaluationGET /v1/evals/{eval_id}
- Retrieve an evaluationPOST /v1/evals/{eval_id}
- Update an evaluationDELETE /v1/evals/{eval_id}
- Delete an evaluationPOST /v1/evals/{eval_id}/runs
- Create and start an evaluation runGET /v1/evals/{eval_id}/runs/{run_id}
- Get evaluation run status
Run tests using pytest:
pytest
- EvalService: Main service for managing evaluations and runs
- EvalProcessor: Handles background processing of evaluation runs
- Graders: Various grading mechanisms for evaluation results
- Agent: MCP-based agent integration for complex evaluations
- Schemas: Pydantic models for data validation and serialization