Skip to content

Latest commit

 

History

History
91 lines (67 loc) · 2.88 KB

File metadata and controls

91 lines (67 loc) · 2.88 KB

ProteinGym2 Benchmark

Getting started

Before you start, you need to create a git-auth.txt file in two folders respectively - supervised and zero_shot:

https://username:token@github.com

Benchmark

Local environment

There are two games to benchmark: supervised and zero-shot. Each game has its selected list of models and datasets defined in dvc.yaml.

  • Supervised game is defined in this dvc.yaml
  • Zero-shot game is defined in this dvc.yaml

The models and datasets are defined in vars at the top, and DVC translates vars into a matrix, which is namely a loop defined as the following pseudo-code:

for dataset in datasets:
    for model in models:
        predict()

for dataset in datasets:
    for model in models:
        calculate_metric()

Supervised

You can benchmark a group of supervised models:

dvc repro supervised/dvc.yaml

Zero-shot

You can benchmark a group of zero-shot models:

dvc repro zero_shot/dvc.yaml

AWS environment

There are two environments in which to run benchmark: one is the local environment, the other is the AWS environment.

The difference of the AWS environment is that:

  • You need to upload the dataset and model files to S3.
  • You need to build and push your Docker image to ECR.
  • You need to use SageMaker training job to either train or score a model.

Important

In order to use the AWS environment, you need to set up your AWS profile with the below steps:

  1. Execute aws configure sso.
  2. Fill in the required fields, especially: "Default client Region" is "us-east-1". a. SSO session name: pg2benchmark. b. SSO start URL: https://d-90674355f1.awsapps.com/start c. SSO region: us-east-1. d. SSO registration scopes: Leave empty. e. Login via browser.
  3. Select the account: ifflabdev. a. Default client Region is us-east-1. b. CLI default ouptut: Leave empty. c. Profile name: pg2benchmark.
  4. You can find your account ID and profile by executing cat ~/.aws/config.
  5. Finally, you can run dvc repro with environment variables in each game: AWS_ACCOUNT_ID=xxx AWS_PROFILE=yyy dvc repro
  6. Before you run dvc repro, you need to change the filename of dvc.aws.yaml to dvc.yaml.

Supervised

You can benchmark a group of supervised models:

AWS_ACCOUNT_ID=xxx AWS_PROFILE=yyy dvc repro supervised/dvc.yaml

Zero-shot

You can benchmark a group of zero-shot models:

AWS_ACCOUNT_ID=xxx AWS_PROFILE=yyy dvc repro zero_shot/dvc.yaml

Generate dummy data

You can generate dummy data by the following command:

uv run pg2-benchmark dataset generate-dummy-data supervised/data/dummy/charge_ladder.csv --n-rows 5 --sequence-length 100