Skip to content

Debiasing (concept editing + dataset biasing)#5

Draft
dariuskia wants to merge 26 commits into
xocelyk:feat/caa-steeringfrom
dariuskia:darius/debiasing
Draft

Debiasing (concept editing + dataset biasing)#5
dariuskia wants to merge 26 commits into
xocelyk:feat/caa-steeringfrom
dariuskia:darius/debiasing

Conversation

@dariuskia
Copy link
Copy Markdown

Add Biasing and Debiasing Experiments Framework

Implements few-shot biasing experiments and cross-bias ACE debiasing.

Files Added

  • configs/biasing_experiments.yaml - Example biasing config
  • configs/bias_test_quick.yaml - Quick validation config
  • example/sports_understanding_fewshot.json - Sports fewshot examples
  • example/anachronisms_fewshot.json - Anachronism fewshot examples
  • example/social_chemistry_fewshot.json - Social chemistry fewshot examples
  • example/logical_deduction_fewshot.json - Logic fewshot examples
  • scripts/test_biasing.py - Biasing validation tests
  • scripts/test_biasing_config.py - Config loading tests

Files Modified

  • src/data_loading.py - Biased few-shot sampling and dataset creation
  • src/config.py - Added bias parameters to DatasetConfig
  • src/cache_manager.py - Updated hashing for bias experiments
  • src/experiment_runner.py - Bias-aware dataset loading and train/test splitting

Enables experiments with positive/negative/neutral bias and cross-bias debiasing where ACE
interventions trained on one bias type are tested on the opposite bias.

@dariuskia dariuskia marked this pull request as draft August 14, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant