Research Paper: AI-Assisted Clang Compiler Contributions

Purpose

This document defines the goals, structure, data requirements, and contributor responsibilities for a publishable research paper documenting the C++ Alliance's AI-assisted Clang contribution initiative during 2026.

The paper will demonstrate, with full transparency and rigorous data, that human-agentic workflows produce high-quality compiler contributions at scale. It is both a scientific contribution and a proof-of-concept for AI-assisted open-source development.

Authors

All team members listed in README.md are authors. Additional contributors who participate in data collection or upstream submissions may be added.

Core Thesis

AI-assisted contributions from the C++ Alliance accounted for a significant share of Clang modifications in 2026, with a high upstream merge rate demonstrating quality rather than just volume. Full-disclosure methodology--every prompt, every chat, every cost--proves the work is reproducible and the results are real.

This is a success story. The paper exists to show that AI works for real compiler contributions, backed by data that leaves nothing to "trust me, bro."

Target Venue

Suitable journals or conferences for submission:

ICSE (International Conference on Software Engineering)
ASE (Automated Software Engineering)
LLVM Developers' Meeting
IEEE Software
CPPCon 2026 (presentation alongside paper)

Metrics to Collect

Contribution Volume

Percentage of all code modified in Clang attributable to our team (target: ~65% for 2026)
Total lines of code added, modified, and deleted
Number of PRs submitted upstream
Number of issues closed

PR Success Rate

Percentage of submitted PRs merged upstream (target: ~85%)
Rejection reasons for PRs that were not merged
Time from submission to merge

Cost Analysis

Total spend on AI reasoning (API costs by model and provider)
Compute costs (cloud build infrastructure)
Tooling costs
Human hours invested (review time, prompting time, upstream interaction)
Cost per merged PR
Cost per line of code

Prompting Complexity

One-shot fixes: issues resolved with a single AI interaction
Multi-turn fixes: issues requiring iterative human-AI collaboration
Human-only fixes: issues where AI could not contribute meaningfully
Average number of prompting rounds per fix
Distribution of fix complexity across categories

Time Metrics

Elapsed time from issue identification to merged PR
Time spent in AI generation vs. human review vs. upstream review
Comparison to historical fix times for similar issues (where data exists)

Bug Classification

Compiler crashes (ICE)
Miscompilations
Diagnostic improvements
Standard conformance fixes
Performance improvements
Test coverage improvements

Methodology Documentation

What Every Contributor Must Record

This is non-negotiable. The paper's credibility depends on full disclosure.

For every AI-assisted contribution:

Chat transcripts: The complete conversation with the AI, verbatim. Export the full chat session. Do not summarize or redact.
Prompts: The exact prompts used. If the prompt was refined over multiple iterations, save every version.
Human intervention points: Where did you correct the AI? Where did you guide it? Where did you override it? Annotate specifically what the human contributed vs. what the AI produced.
Tool configuration: Which AI model (name and version), which IDE or tool (Cursor, API direct, etc.), which settings or system prompts.
Iteration count: How many rounds of prompting were required before the fix was ready for review.
Time tracking: How long did the AI interaction take? How long did human review take?

Data Format

Chat exports saved as markdown files
One file per contribution, named to match the upstream PR or issue number
Stored in a dedicated directory in this repository (e.g., transcripts/)
Metadata header in each file: date, contributor, model, issue reference, PR reference

Collection Cadence

Contributors export and commit their interaction data weekly
Monthly checkpoint reviews to ensure completeness
No retroactive reconstruction--if it wasn't saved at the time, it's lost

Analysis Dimensions

The paper should present both quantitative metrics (above) and qualitative analysis:

Human-Machine Interaction Patterns

PRs where AI did most of the work with light human guidance
PRs where heavy human-AI collaboration was required
PRs where the human essentially solved it and AI assisted with boilerplate
What patterns emerge? When does AI add the most value?

What Was Hard for AI

Categories of problems where AI struggled or failed
Specific examples with explanation of why
Did difficulty correlate with problem type, codebase area, or something else?

What Was Easy for AI

Categories of problems AI solved in one shot
What made these amenable to AI assistance?
Can we predict which issues will be easy vs. hard?

Failure Cases

PRs that were rejected upstream and why
Cases where AI-generated code was incorrect in subtle ways
Cases where the AI approach was fundamentally wrong
Lessons learned from each failure

Skill Transfer

Did working with AI help contributors learn Clang internals faster?
Did the AI surface knowledge that would otherwise require Richard Smith-level expertise?
How did contributor effectiveness change over the course of the project?

AI Capability Over Time

Did newer models perform better on the same types of problems?
Did contributor skill at prompting improve over time?
What was the learning curve?

Paper Structure

The final research paper should follow this approximate structure:

Title, Authors, Affiliations
- All team members as co-authors
- C++ Alliance affiliation
Abstract
- One paragraph: what we did, what we found, why it matters
Introduction
- The compiler implementation crisis (P3962 as evidence)
- Why AI-assisted contributions are worth studying
- Our approach: human-agentic workflow with full transparency
Background and Related Work
- AI-for-code research landscape
- The Beman Project and implementation gaps
- Prior work on AI-assisted open-source contributions
- Human-agentic workflow concept
Methodology
- Workflow description (reference CPPA0001)
- Tools and models used
- Data collection procedures
- Full disclosure of AI involvement
Results
- Contribution volume and merge rates
- Cost analysis
- Prompting complexity distribution
- Time metrics
- Bug classification breakdown
Case Studies
- 3-5 selected PRs spanning the spectrum:
  - A clean one-shot fix
  - A complex multi-turn collaboration
  - A failure that was instructive
  - A case with heavy human-AI interaction
- Each case study includes the actual prompts and key excerpts from the chat
Discussion
- What worked and what didn't
- Cost-benefit analysis: is this economically viable?
- Implications for open-source compiler development
- Implications for AI-assisted development generally
- How much was the AI vs. how much was the human?
Threats to Validity
- Selection bias in which issues were attempted
- Team expertise as a confound
- Model capability changes during the study period
- Measurement limitations
Conclusion
- Summary of findings
- Recommendations for other projects
- Future work
Appendices
- Selected full chat transcripts
- Prompt templates used
- Complete metrics tables
- Tool configuration details

Contributor Responsibilities

Krystian Staziowski (Lead Scientist)

Prioritize which issues and PRs to pursue
Ensure technical quality of all contributions
Provide case study narratives for complex fixes
Review analysis for technical accuracy

Matheus Izvekov (Reviewer)

Record all review interactions with AI-generated patches
Document cases where AI output required significant correction
Provide perspective on AI vs. traditional workflow effectiveness

Vlad Serebrennikov (Reviewer)

Record all review interactions with AI-generated patches
Document upstream community reception and sentiment
Provide historical context (comparison to pre-AI contribution rates)

All Contributors

Save every chat. Every prompt. Every interaction.
Tag contributions with metadata (model, attempt count, time)
Flag interesting cases: one-shots, failures, heavy human involvement
Export data weekly

Timeline

Milestone	Target
Data collection begins	Immediately
First quarterly metrics review	Q2 2026
CPPCon presentation prep	Q3 2026
Draft paper complete	Q4 2026
Paper submission	Q4 2026 / Q1 2027

What Success Looks Like

The paper should be compelling enough that a reader finishes it and thinks:

"AI-assisted compiler contributions clearly work at scale"
"These people measured everything--I can trust these results"
"I want to try this on my project"

The more detail we present, the better. No hand-waving. No "just trust us." Every claim backed by data. Every interaction traceable. Every cost accounted for.

This is science. Act accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research Paper: AI-Assisted Clang Compiler Contributions

Purpose

Authors

Core Thesis

Target Venue

Metrics to Collect

Contribution Volume

PR Success Rate

Cost Analysis

Prompting Complexity

Time Metrics

Bug Classification

Methodology Documentation

What Every Contributor Must Record

Data Format

Collection Cadence

Analysis Dimensions

Human-Machine Interaction Patterns

What Was Hard for AI

What Was Easy for AI

Failure Cases

Skill Transfer

AI Capability Over Time

Paper Structure

Contributor Responsibilities

Krystian Staziowski (Lead Scientist)

Matheus Izvekov (Reviewer)

Vlad Serebrennikov (Reviewer)

All Contributors

Timeline

What Success Looks Like

FilesExpand file tree

research-output.md

Latest commit

History

research-output.md

File metadata and controls

Research Paper: AI-Assisted Clang Compiler Contributions

Purpose

Authors

Core Thesis

Target Venue

Metrics to Collect

Contribution Volume

PR Success Rate

Cost Analysis

Prompting Complexity

Time Metrics

Bug Classification

Methodology Documentation

What Every Contributor Must Record

Data Format

Collection Cadence

Analysis Dimensions

Human-Machine Interaction Patterns

What Was Hard for AI

What Was Easy for AI

Failure Cases

Skill Transfer

AI Capability Over Time

Paper Structure

Contributor Responsibilities

Krystian Staziowski (Lead Scientist)

Matheus Izvekov (Reviewer)

Vlad Serebrennikov (Reviewer)

All Contributors

Timeline

What Success Looks Like