Before you start, you need to create a git-auth.txt file in two folders respectively - supervised and zero_shot:
https://username:token@github.com
There are two games to benchmark: supervised and zero-shot. Each game has its selected list of models and datasets defined in dvc.yaml.
The models and datasets are defined in vars at the top, and DVC translates vars into a matrix, which is namely a loop defined as the following pseudo-code:
for dataset in datasets:
for model in models:
predict()
for dataset in datasets:
for model in models:
calculate_metric()You can benchmark a group of supervised models:
cd supervised && dvc reproYou can benchmark a group of zero-shot models:
cd zero_shot && dvc reproThere are two environments in which to run benchmark: one is the local environment, the other is the AWS environment.
The difference of the AWS environment is that:
- You need to upload the data and model TOML files and the actual data to S3.
- You need to build and push your Docker image to ECR.
- You need to use SageMaker training job to either train or score a model.
Important
In order to use the AWS environment, you need to set up your AWS profile with the below steps:
- Execute
aws configure sso. - Fill in the required fields, especially: "Default client Region" is "us-east-1".
- You can find your account ID and profile by executing
cat ~/.aws/config. - Finally, you can run
dvc reprowith environment variables in each game:AWS_ACCOUNT_ID=xxx AWS_PROFILE=yyy dvc repro
You can generate dummy data by the following command:
uv run pg2-benchmark dataset generate-dummy-data supervised/data/dummy/charge_ladder.csv --n-rows 5 --sequence-length 100