-
Notifications
You must be signed in to change notification settings - Fork 17
[Task Submission] Bias-amplified Splits (bias_amplified_splits)
#23
base: main
Are you sure you want to change the base?
Conversation
|
Hello! We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), so if your PR needs any final changes, please make them now, Good luck finalising your PR and paper, feel free to tag us if you have questions. |
|
@yreif We're in the process of merging the tasks into the repo. In order to merge your task, we need the following changes:
|
|
Hey @yreif. Is there any updates regarding the usage_example? |
|
Hey @kazemnejad, I apologize for the delayed response. |
|
@yreif Thanks for your efforts. Yes, adding an example for the prompt-based tasks would be great. You can create a second usage_example file if you don't want to change the finetuning example :) |
|
@yreif Any updates on the prompt-based tasks example? |
|
@yreif A kind reminder. |
[Bias-amplified Splits]
Our work proposes a novel evaluation framework to assess model robustness, by amplifying dataset biases in the training data and challenging models to generalize beyond them. This framework is defined by a bias-amplified training set and a hard, anti-biased test set, which we automatically extract from existing datasets using a novel, clustering-based approach for identifying minority examples—examples that defy common statistical patterns found in the rest of the dataset.
Authors
[email protected][email protected]Implementation
The sub-tasks implement the following methods:
format_example: formats sentence-pair tasks to a single input, to match the (input, target) format.get_datasets_raw: This re-implementation is for the sole purpose of passing assertions/tests. Our task re-splits datasets, some of which don't originally have 'validation' or 'test' splits (e.g., in MultiNLI there's only validation_matched/mistmatched), but the task must contain splits named 'validation'/'test'. However, naming the splits 'validation' in the split.jsonnet is not enough -- they must already exist in the dataset. Therefore we artificially add these splits when needed (for MultiNLI and WANLI).Usage
If your evaluation function should be ran in any other way than the default way
(
task.evaluate_predictionS(predictions, gold), you can describe this here.Checklist:
genbench-cli test-tasktool.