Survey Assist Evaluation

Evaluation utilities used as part of Survey Assist

Overview

Survey Assist evaluation functions. This repository contains utilities for evaluating the performance of Survey Assist's Large Language Models (LLMs) in classifying Standard Industrial Classification (SIC) codes. The evaluation framework includes tools for batch processing of datasets, as well as a comprehensive suite of metrics to analyze and compare LLM performance against human coders.

Features

Batch Processing: Send large datasets to the API for SIC classification.
Data Extraction and Processing: Utilities to extract survey response data from a Firestore database, reformat it, and save it in CSV format for analysis.
Performance Evaluation: A comprehensive suite of metrics to analyze and compare LLM performance against human coders.

Local Development & Setup

The Makefile defines a set of commonly used commands and workflows. Where possible use the files defined in the Makefile.

Prerequisites

Ensure you have the following installed on your local machine:

Python 3.12 (Recommended: use pyenv to manage versions)
poetry (for dependency management)
Google Cloud SDK (gcloud) with appropriate permissions
Colima (if running locally with containers)
Terraform (for infrastructure management)

Setup Instructions

Clone the repository

git clone [https://github.com/ONSdigital/survey-assist-eval.git](https://github.com/ONSdigital/survey-assist-eval.git)
cd survey-assist-eval

Create and activate a virtual environment

Using pyenv and pyenv-virtualenv:
```
python3.12 -m venv .venv
source .venv/bin/activate
```
Install Dependencies
```
poetry install
```
Note this installs partner repos (e.g. sic-classification-utils, at a pinned version). To evaluate concurrent changes to the codebase locally, it may be preferable to install from a local path instead. To do this:
1. Clone the sic-classification-utils repository at the same directory level as this repository.
2. From the root of this repository (with the virtual environment activated), run:
```
python -m pip install --no-deps --editable ../sic-classification-utils
```
Generate an API Token

The API uses Application Default Credentials to generate and authenticate tokens.

Ensure GOOGLE_APPLICATION_CREDENTIALS are not set in your environment.
```
unset GOODLE_APPLICATION_CREDENTIALS
```
Login to gcloud application default:
```
gcloud auth application-default login
```
Set to the correct GCP project:
```
gcloud auth application-default set-quota-project GCP-PROJECT-NAME
```
Check the project setting:
```
cat ~/.config/gcloud/application_default_credentials.json
```
Set the required environment variables:
```
export SA_EMAIL="SERVICE-ACCOUNT-FOR-API-ACCESS"
export API_GATEWAY="API GATEWAY URL NOT INC https://"
```
Then, run the make command to use default expiry (1h):
```
make generate-api-token
```
You can run from cli and pass in a chosen expiry time:
```
poetry run generate-api-token -e 7200
```

Code Quality & Testing

Code Quality

Code quality and static analysis are enforced using isort, black, ruff, mypy, pylint, and bandit.

To check for errors without auto-fixing:
```
make check-python-nofix
```
To check and automatically fix errors:
```
make check-python
```

Testing

Pytest is used for testing.

To run unit tests:
```
make unit-tests
```
To run all tests:
```
make all-tests
```

Pre-commit Hooks

Pre-commit hooks are set up to run code quality checks before each commit. They will call make check-python under the hood as well. To install the hooks, run:

pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 419 Commits
.github		.github
containers/batch		containers/batch
data/artificial_data		data/artificial_data
docs		docs
notebooks		notebooks
scripts		scripts
src/survey_assist_eval		src/survey_assist_eval
tests		tests
.coveragerc		.coveragerc
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.toml		config.toml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survey Assist Evaluation

Overview

Features

Local Development & Setup

Prerequisites

Setup Instructions

Code Quality & Testing

Code Quality

Testing

Pre-commit Hooks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Survey Assist Evaluation

Overview

Features

Local Development & Setup

Prerequisites

Setup Instructions

Code Quality & Testing

Code Quality

Testing

Pre-commit Hooks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages