Skip to content

BillHoweLab/laboratory-scale-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Laboratory-Scale AI Repository

This is the official code repository for the ACM FAccT'24 paper Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings, available at https://arxiv.org/pdf/2405.16820.

1. Structure

The project is built primarily on the HuggingFace transformers, trl, datasets, and evaluate libraries. We used bash files to run analyses on a GCP cloud instance. Some contributors containerized the repo using docker; see the Dockerfile in this directory for an example of how this could look. Intermediate results were logged to the wandb accounts of individual contributors.

Each of the contributors cloned a main branch and customized it to some degree for their specific analysis. To make the resulting analyses easier to parse, we've organized the code for each analysis into a subdirectory for this release, with two primary directories for performance results (entity resolution, clinical dialogue summarization, fact-checking) and values results (bias, privacy, abstention).

2. Requirements

The requirements file includes the libraries needed to run the analyses in each of the subdirectories; using the dp-transformers repo may necessitate installing its dependencies as well. For the privacy task, you'll need to copy the dp-transformers repository from https://github.com/microsoft/dp-transformers. We've left a directory for this to be added. We recommend creating a unique environment for running the project (for example, with conda, conda create -n "labscale" python=3.11) and then installing the requirements (pip install -r requirements.txt).

3. Token Access

A HuggingFace account is needed to use some of the HuggingFace Hub functionality, which includes verifying access to models like LLaMA-2. You can get a token from the account here: https://huggingface.co/settings/tokens

To log results during fine-tuning and evaluation, you'll also want an account with Weights and Biases; see https://wandb.ai. Then, make a new project, and it will provide you with a token.

4. Paper & Citation

Please cite the following version of the Lab-Scale AI paper, from the ACM FAccT proceedings:

@article{wolfe2024lab-scale,
      title={Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings},
      author={Wolfe, Robert and Slaughter, Isaac and Han, Bin and Wen, Bingbing and Yang, Yiwei and Rosenblatt, Lucas and Herman, Bernease and Brown, Eva and Qu, Zening and Weber, Nic and Howe, Bill},
      journal={arXiv preprint arXiv:2405.16820},
      year={2024}
}

5. Other Resources

This repository is primarily a reference implementation intended for reproducibility, insofar as that's possible given the stochasticity of the models and the black box character of evaluating closed models. However, this also means that our repo might not be the best starting place for everyone who wants to customize their own open models. There are many great resources for using the technologies employed in the paper, some of which are more geared toward newer users. Some resources we found helpful include:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published