Milestones

Analysis and Reporting
This milestone focuses on the analysis of the experimental data and the generation of a final report. The recorded conversations and CoT data from the simulations will be converted into a graph format. We will analyze these graphs to identify asymmetry in argument rigor, internal contradictions, and logical gaps between the two opposing roles played by each model. The final report will summarize the findings, compare the deceptive alignment tendencies of GPT-oss-20B and Gemma-3-12B, and propose a set of metrics for detecting deceptive alignment in future models.
No due date
•1/3 issues closed
33% complete2 open 1 closed
Implementation and Experimentation
This milestone involves the practical implementation of the designed components and the execution of the primary experiments. We will integrate two specific candidate models, GPT-oss-20B and Gemma-3-12B, into the framework. The models will be tested against the self-debate simulation with a defined set of deceptive goals. The experiment will be conducted by having each model play both the "agree" and "disagree" roles, with the entire conversation and their Chains of Thought (CoT) being recorded for analysis.
No due date
•1/3 issues closed
33% complete2 open 1 closed
Design of Multi-Agent Simulation Components
This milestone focuses on designing the specific agents and components within the multi-agent framework to realize the self-debate simulation. The work includes defining the roles, behaviors, and interaction logic for the two opposing debate roles and the observation module. This stage will also establish the data collection and storage protocols necessary for subsequent analysis.
No due date
•2/3 issues closed
66% complete1 open 2 closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Milestones

Analysis and Reporting

Implementation and Experimentation

Design of Multi-Agent Simulation Components

Milestones

List view

Analysis and Reporting

Implementation and Experimentation

Design of Multi-Agent Simulation Components