[Task Submission] Bias-amplified Splits (`bias_amplified_splits`) #23

yuvalreif · 2023-08-02T08:24:31Z

[Bias-amplified Splits]

Our work proposes a novel evaluation framework to assess model robustness, by amplifying dataset biases in the training data and challenging models to generalize beyond them. This framework is defined by a bias-amplified training set and a hard, anti-biased test set, which we automatically extract from existing datasets using a novel, clustering-based approach for identifying minority examples—examples that defy common statistical patterns found in the rest of the dataset.

Authors

Yuval Reif [email protected]
Roy Schwartz [email protected]

Implementation

The sub-tasks implement the following methods:

format_example: formats sentence-pair tasks to a single input, to match the (input, target) format.
get_datasets_raw: This re-implementation is for the sole purpose of passing assertions/tests. Our task re-splits datasets, some of which don't originally have 'validation' or 'test' splits (e.g., in MultiNLI there's only validation_matched/mistmatched), but the task must contain splits named 'validation'/'test'. However, naming the splits 'validation' in the split.jsonnet is not enough -- they must already exist in the dataset. Therefore we artificially add these splits when needed (for MultiNLI and WANLI).

Usage

If your evaluation function should be ran in any other way than the default way
(task.evaluate_predictionS(predictions, gold), you can describe this here.

Checklist:

I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
I have read the description of what should be in the doc.md of my task, and have added the required arguments.
I have submitted or will submit an accompanying paper to the GenBench workshop.

…ed')

vernadankers · 2023-09-01T10:26:22Z

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), so if your PR needs any final changes, please make them now,
and don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions.
Cheers, Verna
On behalf of the GenBench team

… into bias_amplified_splits

kazemnejad · 2023-11-01T16:25:28Z

@yreif We're in the process of merging the tasks into the repo. In order to merge your task, we need the following changes:

Could you please include a single file usage_example.py of each task where you showcase the full pipeline of using each task for finetuning and evaluation of the way you intent your tasks must be used. Preferably, it should be done on a pretrained huggingface model. Please also include requirements-usage-example.txt for the python dependencies needed to be installed for running the example.

kazemnejad · 2023-11-16T15:40:00Z

Hey @yreif. Is there any updates regarding the usage_example?
Thanks.

yuvalreif · 2023-11-21T02:17:40Z

Hey @kazemnejad, I apologize for the delayed response.
I added usage_example + requirements files for each of the fine-tuning subtasks in the submission.
There are also two evaluation-only prompt based subtasks, should I add usage examples for these as well?
Thanks & appreciate your understanding.

kazemnejad · 2023-11-29T19:23:21Z

@yreif Thanks for your efforts. Yes, adding an example for the prompt-based tasks would be great. You can create a second usage_example file if you don't want to change the finetuning example :)

kazemnejad · 2023-12-14T16:10:16Z

@yreif Any updates on the prompt-based tasks example?

kazemnejad · 2023-12-31T02:08:12Z

@yreif A kind reminder.

Yuval Reif added 15 commits August 2, 2023 09:48

Add Bias-amplified Splits

41b7dc3

add git_commit_sha for evaluation metrics

89ada8b

wanli: change labels from str to int

6bbb694

wanli: fix hf_id

ffea9f5

update data to a smaller size (100 samples for train/dev/test)

0753cc2

wanli: update dataset to contain a 'validation' split

4fae897

wanli: fix adding the 'validation' split to the dataset

3e1f9c7

update documentation: 'usage'

916f567

mnli: fix split names to 'validation', 'test' (instead of 'X_mismatch…

e299caa

…ed')

wanli: fix adding 'validation' set to data

3c50c14

qqp: remove f1 because of bug, for now

655ab8c

fix quality issues

7c7665a

mnli: update documentation -- usage

fdcab57

update task documentation

cbe245d

update task documentation

eee6868

vernadankers added the task-submission label Aug 2, 2023

Merge branch 'main' into bias_amplified_splits

cd30a88

kazemnejad added task-submission and removed task-submission labels Aug 23, 2023

Yuval Reif added 3 commits September 1, 2023 14:54

update data splits back to full size

19ca783

Merge branch 'bias_amplified_splits' of github.com:yreif/genbench_cbt…

8a39f4c

… into bias_amplified_splits

add prompt-based versions of mnli and wanli

ab481b2

vernadankers added task-submission and removed task-submission labels Sep 7, 2023

add minimal usage examples for fine-tuning tasks

2f140dd

kazemnejad added task-submission and removed task-submission labels Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Task Submission] Bias-amplified Splits (`bias_amplified_splits`) #23

[Task Submission] Bias-amplified Splits (`bias_amplified_splits`) #23

Uh oh!

yuvalreif commented Aug 2, 2023

Uh oh!

vernadankers commented Sep 1, 2023

Uh oh!

kazemnejad commented Nov 1, 2023

Uh oh!

kazemnejad commented Nov 16, 2023

Uh oh!

yuvalreif commented Nov 21, 2023

Uh oh!

kazemnejad commented Nov 29, 2023

Uh oh!

kazemnejad commented Dec 14, 2023

Uh oh!

kazemnejad commented Dec 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Task Submission] Bias-amplified Splits (bias_amplified_splits) #23

Are you sure you want to change the base?

[Task Submission] Bias-amplified Splits (bias_amplified_splits) #23

Uh oh!

Conversation

yuvalreif commented Aug 2, 2023

[Bias-amplified Splits]

Authors

Implementation

Usage

Checklist:

Uh oh!

vernadankers commented Sep 1, 2023

Uh oh!

kazemnejad commented Nov 1, 2023

Uh oh!

kazemnejad commented Nov 16, 2023

Uh oh!

yuvalreif commented Nov 21, 2023

Uh oh!

kazemnejad commented Nov 29, 2023

Uh oh!

kazemnejad commented Dec 14, 2023

Uh oh!

kazemnejad commented Dec 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Task Submission] Bias-amplified Splits (`bias_amplified_splits`) #23

[Task Submission] Bias-amplified Splits (`bias_amplified_splits`) #23