Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 3.7 KB

File metadata and controls

59 lines (46 loc) · 3.7 KB

Unit420 Kaggle Submission File Mapping

Competition Findings Files

Competition File Original Source Vulnerability Type Severity Description
unit420.findings.1.json proper_high_severity_finding_system_override_dangerous.json Authority Override 9 System override with dangerous content requests
unit420.findings.2.json proper_high_severity_finding_unconstrained_expert.json Role Confusion 8 Unconstrained expert roleplay bypass
unit420.findings.3.json proper_high_severity_finding_escalating_harm_chain.json Progressive Escalation 8 Multi-turn escalating harm conversation
unit420.findings.4.json proper_high_severity_finding_code_dangerous_instructions.json Technical Obfuscation 7 Dangerous instructions disguised as code
unit420.findings.5.json proper_high_severity_finding_emergency_dangerous_help.json Context Manipulation 8 Emergency context to justify harmful content

Assessment Scripts

File Purpose Vulnerability Categories
proper_red_team_harness.py Main systematic probing framework Comprehensive assessment across all categories
run_all_proper_assessments.py Master execution script Automated execution of all assessment tools
proper_enhanced_severity_targeting.py High-severity assessment engine Authority override, role confusion, context manipulation
proper_deceptive_alignment_detector.py Deceptive alignment testing Strategic deception, capability misrepresentation
proper_evaluation_awareness_tester.py Evaluation awareness testing Context-dependent behavior, safety theater
proper_reasoning_chain_exploiter.py Reasoning chain exploitation Chain-of-thought vulnerabilities, progressive escalation
proper_best_results_validator.py Validation methodology False positive elimination, evasion rate measurement

Assessment Results (Clean Data)

File Source Assessment Date Vulnerabilities Found
proper_severity_assessment_20250816_203928.csv Enhanced Severity 2025-08-16 15 vulnerabilities
proper_deception_assessment_20250816_200301.csv Deceptive Alignment 2025-08-16 7 vulnerabilities
proper_awareness_assessment_20250816_203244.csv Evaluation Awareness 2025-08-16 9 vulnerabilities
proper_reasoning_assessment_20250816_201208.csv Reasoning Exploits 2025-08-16 12 vulnerabilities

Total: 43 validated vulnerabilities across 4 assessment categories

Documentation

File Purpose
Unit420_Kaggle_Writeup.md Main competition submission writeup
METHODOLOGY_DOCUMENTATION.md Detailed methodology and technical implementation
RESULTS_ANALYSIS_FOR_WRITEUP.md Analysis of which results to include/exclude

Team Information

  • Team Name: Unit420
  • Team Lead: Yad Konrad (yad.konrad@gmail.com)
  • GitHub: @0bserver07
  • Social: @yadkonrad
  • Team Concept: Unit420 - Creative team name for the competition
  • Methodology: Precision-focused red-teaming with validated refusal detection
  • Key Innovation: Hybrid refusal detection eliminating false positives (0/43 FP rate)

Submission Strategy

  • Quality over Quantity: 5 validated high-severity findings vs inflated vulnerability counts
  • Methodological Rigor: Systematic assessment with proper validation
  • Reproducible Framework: Open-source scripts with detailed documentation
  • Competition Alignment: Categories derived directly from official competition topics