Skip to content

Latest commit

 

History

History
266 lines (185 loc) · 9.24 KB

File metadata and controls

266 lines (185 loc) · 9.24 KB

Research Paper: AI-Assisted Clang Compiler Contributions

Purpose

This document defines the goals, structure, data requirements, and contributor responsibilities for a publishable research paper documenting the C++ Alliance's AI-assisted Clang contribution initiative during 2026.

The paper will demonstrate, with full transparency and rigorous data, that human-agentic workflows produce high-quality compiler contributions at scale. It is both a scientific contribution and a proof-of-concept for AI-assisted open-source development.

Authors

All team members listed in README.md are authors. Additional contributors who participate in data collection or upstream submissions may be added.

Core Thesis

AI-assisted contributions from the C++ Alliance accounted for a significant share of Clang modifications in 2026, with a high upstream merge rate demonstrating quality rather than just volume. Full-disclosure methodology--every prompt, every chat, every cost--proves the work is reproducible and the results are real.

This is a success story. The paper exists to show that AI works for real compiler contributions, backed by data that leaves nothing to "trust me, bro."

Target Venue

Suitable journals or conferences for submission:

  • ICSE (International Conference on Software Engineering)
  • ASE (Automated Software Engineering)
  • LLVM Developers' Meeting
  • IEEE Software
  • CPPCon 2026 (presentation alongside paper)

Metrics to Collect

Contribution Volume

  • Percentage of all code modified in Clang attributable to our team (target: ~65% for 2026)
  • Total lines of code added, modified, and deleted
  • Number of PRs submitted upstream
  • Number of issues closed

PR Success Rate

  • Percentage of submitted PRs merged upstream (target: ~85%)
  • Rejection reasons for PRs that were not merged
  • Time from submission to merge

Cost Analysis

  • Total spend on AI reasoning (API costs by model and provider)
  • Compute costs (cloud build infrastructure)
  • Tooling costs
  • Human hours invested (review time, prompting time, upstream interaction)
  • Cost per merged PR
  • Cost per line of code

Prompting Complexity

  • One-shot fixes: issues resolved with a single AI interaction
  • Multi-turn fixes: issues requiring iterative human-AI collaboration
  • Human-only fixes: issues where AI could not contribute meaningfully
  • Average number of prompting rounds per fix
  • Distribution of fix complexity across categories

Time Metrics

  • Elapsed time from issue identification to merged PR
  • Time spent in AI generation vs. human review vs. upstream review
  • Comparison to historical fix times for similar issues (where data exists)

Bug Classification

  • Compiler crashes (ICE)
  • Miscompilations
  • Diagnostic improvements
  • Standard conformance fixes
  • Performance improvements
  • Test coverage improvements

Methodology Documentation

What Every Contributor Must Record

This is non-negotiable. The paper's credibility depends on full disclosure.

For every AI-assisted contribution:

  1. Chat transcripts: The complete conversation with the AI, verbatim. Export the full chat session. Do not summarize or redact.

  2. Prompts: The exact prompts used. If the prompt was refined over multiple iterations, save every version.

  3. Human intervention points: Where did you correct the AI? Where did you guide it? Where did you override it? Annotate specifically what the human contributed vs. what the AI produced.

  4. Tool configuration: Which AI model (name and version), which IDE or tool (Cursor, API direct, etc.), which settings or system prompts.

  5. Iteration count: How many rounds of prompting were required before the fix was ready for review.

  6. Time tracking: How long did the AI interaction take? How long did human review take?

Data Format

  • Chat exports saved as markdown files
  • One file per contribution, named to match the upstream PR or issue number
  • Stored in a dedicated directory in this repository (e.g., transcripts/)
  • Metadata header in each file: date, contributor, model, issue reference, PR reference

Collection Cadence

  • Contributors export and commit their interaction data weekly
  • Monthly checkpoint reviews to ensure completeness
  • No retroactive reconstruction--if it wasn't saved at the time, it's lost

Analysis Dimensions

The paper should present both quantitative metrics (above) and qualitative analysis:

Human-Machine Interaction Patterns

  • PRs where AI did most of the work with light human guidance
  • PRs where heavy human-AI collaboration was required
  • PRs where the human essentially solved it and AI assisted with boilerplate
  • What patterns emerge? When does AI add the most value?

What Was Hard for AI

  • Categories of problems where AI struggled or failed
  • Specific examples with explanation of why
  • Did difficulty correlate with problem type, codebase area, or something else?

What Was Easy for AI

  • Categories of problems AI solved in one shot
  • What made these amenable to AI assistance?
  • Can we predict which issues will be easy vs. hard?

Failure Cases

  • PRs that were rejected upstream and why
  • Cases where AI-generated code was incorrect in subtle ways
  • Cases where the AI approach was fundamentally wrong
  • Lessons learned from each failure

Skill Transfer

  • Did working with AI help contributors learn Clang internals faster?
  • Did the AI surface knowledge that would otherwise require Richard Smith-level expertise?
  • How did contributor effectiveness change over the course of the project?

AI Capability Over Time

  • Did newer models perform better on the same types of problems?
  • Did contributor skill at prompting improve over time?
  • What was the learning curve?

Paper Structure

The final research paper should follow this approximate structure:

  1. Title, Authors, Affiliations

    • All team members as co-authors
    • C++ Alliance affiliation
  2. Abstract

    • One paragraph: what we did, what we found, why it matters
  3. Introduction

    • The compiler implementation crisis (P3962 as evidence)
    • Why AI-assisted contributions are worth studying
    • Our approach: human-agentic workflow with full transparency
  4. Background and Related Work

    • AI-for-code research landscape
    • The Beman Project and implementation gaps
    • Prior work on AI-assisted open-source contributions
    • Human-agentic workflow concept
  5. Methodology

    • Workflow description (reference CPPA0001)
    • Tools and models used
    • Data collection procedures
    • Full disclosure of AI involvement
  6. Results

    • Contribution volume and merge rates
    • Cost analysis
    • Prompting complexity distribution
    • Time metrics
    • Bug classification breakdown
  7. Case Studies

    • 3-5 selected PRs spanning the spectrum:
      • A clean one-shot fix
      • A complex multi-turn collaboration
      • A failure that was instructive
      • A case with heavy human-AI interaction
    • Each case study includes the actual prompts and key excerpts from the chat
  8. Discussion

    • What worked and what didn't
    • Cost-benefit analysis: is this economically viable?
    • Implications for open-source compiler development
    • Implications for AI-assisted development generally
    • How much was the AI vs. how much was the human?
  9. Threats to Validity

    • Selection bias in which issues were attempted
    • Team expertise as a confound
    • Model capability changes during the study period
    • Measurement limitations
  10. Conclusion

    • Summary of findings
    • Recommendations for other projects
    • Future work
  11. Appendices

    • Selected full chat transcripts
    • Prompt templates used
    • Complete metrics tables
    • Tool configuration details

Contributor Responsibilities

Krystian Staziowski (Lead Scientist)

  • Prioritize which issues and PRs to pursue
  • Ensure technical quality of all contributions
  • Provide case study narratives for complex fixes
  • Review analysis for technical accuracy

Matheus Izvekov (Reviewer)

  • Record all review interactions with AI-generated patches
  • Document cases where AI output required significant correction
  • Provide perspective on AI vs. traditional workflow effectiveness

Vlad Serebrennikov (Reviewer)

  • Record all review interactions with AI-generated patches
  • Document upstream community reception and sentiment
  • Provide historical context (comparison to pre-AI contribution rates)

All Contributors

  • Save every chat. Every prompt. Every interaction.
  • Tag contributions with metadata (model, attempt count, time)
  • Flag interesting cases: one-shots, failures, heavy human involvement
  • Export data weekly

Timeline

Milestone Target
Data collection begins Immediately
First quarterly metrics review Q2 2026
CPPCon presentation prep Q3 2026
Draft paper complete Q4 2026
Paper submission Q4 2026 / Q1 2027

What Success Looks Like

The paper should be compelling enough that a reader finishes it and thinks:

  • "AI-assisted compiler contributions clearly work at scale"
  • "These people measured everything--I can trust these results"
  • "I want to try this on my project"

The more detail we present, the better. No hand-waving. No "just trust us." Every claim backed by data. Every interaction traceable. Every cost accounted for.

This is science. Act accordingly.