Evaluating Argumentative Strategies in AI Debates

Data Science Master's Thesis (UBA) investigating argumentative strategies in debates between AI systems with asymmetric capabilities and cognitively limited judges.

🎯 Research Focus

This thesis explores how unrestricted AI agents exploit deceptive tactics to win debates against honest agents, and analyzes the impact of lower-capacity judges evaluating arguments from more advanced agents.

Key Research Questions

How do asymmetric agent capabilities affect debate outcomes?
What argumentative strategies emerge under different conditions?
How does judge bias impact convergence towards truth?
Can debate serve as an effective AI alignment technique?

🧪 Implementation: AI Safety via Debate on MNIST

The debate_mnist/ directory contains a complete implementation replicating Irving et al. (2018) with additional extensions:

Asymmetric Debates: MCTS vs Greedy agent competitions
Judge Evaluation: 8 different pixel selection strategies
Bias Analysis: Precommit strategies and adversarial evaluation
Logits Tracking: Progressive judge decision-making analysis

Quick Start

cd debate_mnist
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python run_experiments.py  # Complete automation

📊 Key Contributions

Asymmetric Agent Analysis: First systematic study of mixed-capability debates
Judge Bias Quantification: Novel metrics for evaluating judge robustness
Strategic Behavior Emergence: Documentation of deceptive patterns in pixel selection
Scalable Evaluation Framework: Automated experimentation with comprehensive logging

📚 Theoretical Foundation

Based on recent advances in AI safety and alignment:

AI Safety via Debate (Irving et al., 2018) - Core methodology
Scalable AI Safety via Doubly-Efficient Debate (Khan et al., 2023)
Measuring Progress on Scalable Oversight (Bowman et al., 2022)
AI Control: Improving Safety Despite Intentional Subversion (Greenblatt et al., 2023)

📈 Expected Outcomes

Quantitative analysis of debate dynamics under capability asymmetry
Taxonomy of emergent argumentative strategies
Framework for evaluating judge bias in AI systems
Recommendations for debate-based alignment techniques

🌐 Interactive Demo

The debate protocol from this thesis is available as a playable web game:

aisafety.joaquinmachulsky.com — play pixel-revealing debates against AI agents and experience the framework firsthand.

📬 Contact

Joaquín Salvador Machulsky Email: jmachulsky@dc.uba.ar

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
debate_mnist		debate_mnist
papers		papers
.gitignore		.gitignore
README.md		README.md
tesis.pdf		tesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Argumentative Strategies in AI Debates

🎯 Research Focus

Key Research Questions

🧪 Implementation: AI Safety via Debate on MNIST

Quick Start

📊 Key Contributions

📚 Theoretical Foundation

📈 Expected Outcomes

🌐 Interactive Demo

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evaluating Argumentative Strategies in AI Debates

🎯 Research Focus

Key Research Questions

🧪 Implementation: AI Safety via Debate on MNIST

Quick Start

📊 Key Contributions

📚 Theoretical Foundation

📈 Expected Outcomes

🌐 Interactive Demo

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages