Skip to content

ONSdigital/survey-assist-eval

Repository files navigation

Survey Assist Evaluation

Evaluation utilities used as part of Survey Assist

Overview

Survey Assist evaluation functions. This repository contains utilities for evaluating the performance of Survey Assist's Large Language Models (LLMs) in classifying Standard Industrial Classification (SIC) codes. The evaluation framework includes tools for batch processing of datasets, as well as a comprehensive suite of metrics to analyze and compare LLM performance against human coders.

Features

  • Batch Processing: Send large datasets to the API for SIC classification.
  • Data Extraction and Processing: Utilities to extract survey response data from a Firestore database, reformat it, and save it in CSV format for analysis.
  • Performance Evaluation: A comprehensive suite of metrics to analyze and compare LLM performance against human coders.

Local Development & Setup

The Makefile defines a set of commonly used commands and workflows. Where possible use the files defined in the Makefile.

Prerequisites

Ensure you have the following installed on your local machine:

  • Python 3.12 (Recommended: use pyenv to manage versions)
  • poetry (for dependency management)
  • Google Cloud SDK (gcloud) with appropriate permissions
  • Colima (if running locally with containers)
  • Terraform (for infrastructure management)

Setup Instructions

  1. Clone the repository

    git clone [https://github.com/ONSdigital/survey-assist-eval.git](https://github.com/ONSdigital/survey-assist-eval.git)
    cd survey-assist-eval
  2. Create and activate a virtual environment

    Using pyenv and pyenv-virtualenv:

    python3.12 -m venv .venv
    source .venv/bin/activate
  3. Install Dependencies

    poetry install

    Note this installs partner repos (e.g. sic-classification-utils, at a pinned version). To evaluate concurrent changes to the codebase locally, it may be preferable to install from a local path instead. To do this:

    1. Clone the sic-classification-utils repository at the same directory level as this repository.
    2. From the root of this repository (with the virtual environment activated), run:
      python -m pip install --no-deps --editable ../sic-classification-utils
  4. Generate an API Token

    The API uses Application Default Credentials to generate and authenticate tokens.

    Ensure GOOGLE_APPLICATION_CREDENTIALS are not set in your environment.

    unset GOODLE_APPLICATION_CREDENTIALS

    Login to gcloud application default:

    gcloud auth application-default login

    Set to the correct GCP project:

    gcloud auth application-default set-quota-project GCP-PROJECT-NAME

    Check the project setting:

    cat ~/.config/gcloud/application_default_credentials.json

    Set the required environment variables:

    export SA_EMAIL="SERVICE-ACCOUNT-FOR-API-ACCESS"
    export API_GATEWAY="API GATEWAY URL NOT INC https://"

    Then, run the make command to use default expiry (1h):

    make generate-api-token

    You can run from cli and pass in a chosen expiry time:

    poetry run generate-api-token -e 7200

Code Quality & Testing

Code Quality

Code quality and static analysis are enforced using isort, black, ruff, mypy, pylint, and bandit.

  • To check for errors without auto-fixing:
    make check-python-nofix
  • To check and automatically fix errors:
    make check-python

Testing

Pytest is used for testing.

  • To run unit tests:
    make unit-tests
  • To run all tests:
    make all-tests

Pre-commit Hooks

Pre-commit hooks are set up to run code quality checks before each commit. They will call make check-python under the hood as well. To install the hooks, run:

pre-commit install

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors