This document defines the goals, structure, data requirements, and contributor responsibilities for a publishable research paper documenting the C++ Alliance's AI-assisted Clang contribution initiative during 2026.
The paper will demonstrate, with full transparency and rigorous data, that human-agentic workflows produce high-quality compiler contributions at scale. It is both a scientific contribution and a proof-of-concept for AI-assisted open-source development.
All team members listed in README.md are authors. Additional contributors who participate in data collection or upstream submissions may be added.
AI-assisted contributions from the C++ Alliance accounted for a significant share of Clang modifications in 2026, with a high upstream merge rate demonstrating quality rather than just volume. Full-disclosure methodology--every prompt, every chat, every cost--proves the work is reproducible and the results are real.
This is a success story. The paper exists to show that AI works for real compiler contributions, backed by data that leaves nothing to "trust me, bro."
Suitable journals or conferences for submission:
- ICSE (International Conference on Software Engineering)
- ASE (Automated Software Engineering)
- LLVM Developers' Meeting
- IEEE Software
- CPPCon 2026 (presentation alongside paper)
- Percentage of all code modified in Clang attributable to our team (target: ~65% for 2026)
- Total lines of code added, modified, and deleted
- Number of PRs submitted upstream
- Number of issues closed
- Percentage of submitted PRs merged upstream (target: ~85%)
- Rejection reasons for PRs that were not merged
- Time from submission to merge
- Total spend on AI reasoning (API costs by model and provider)
- Compute costs (cloud build infrastructure)
- Tooling costs
- Human hours invested (review time, prompting time, upstream interaction)
- Cost per merged PR
- Cost per line of code
- One-shot fixes: issues resolved with a single AI interaction
- Multi-turn fixes: issues requiring iterative human-AI collaboration
- Human-only fixes: issues where AI could not contribute meaningfully
- Average number of prompting rounds per fix
- Distribution of fix complexity across categories
- Elapsed time from issue identification to merged PR
- Time spent in AI generation vs. human review vs. upstream review
- Comparison to historical fix times for similar issues (where data exists)
- Compiler crashes (ICE)
- Miscompilations
- Diagnostic improvements
- Standard conformance fixes
- Performance improvements
- Test coverage improvements
This is non-negotiable. The paper's credibility depends on full disclosure.
For every AI-assisted contribution:
-
Chat transcripts: The complete conversation with the AI, verbatim. Export the full chat session. Do not summarize or redact.
-
Prompts: The exact prompts used. If the prompt was refined over multiple iterations, save every version.
-
Human intervention points: Where did you correct the AI? Where did you guide it? Where did you override it? Annotate specifically what the human contributed vs. what the AI produced.
-
Tool configuration: Which AI model (name and version), which IDE or tool (Cursor, API direct, etc.), which settings or system prompts.
-
Iteration count: How many rounds of prompting were required before the fix was ready for review.
-
Time tracking: How long did the AI interaction take? How long did human review take?
- Chat exports saved as markdown files
- One file per contribution, named to match the upstream PR or issue number
- Stored in a dedicated directory in this repository (e.g.,
transcripts/) - Metadata header in each file: date, contributor, model, issue reference, PR reference
- Contributors export and commit their interaction data weekly
- Monthly checkpoint reviews to ensure completeness
- No retroactive reconstruction--if it wasn't saved at the time, it's lost
The paper should present both quantitative metrics (above) and qualitative analysis:
- PRs where AI did most of the work with light human guidance
- PRs where heavy human-AI collaboration was required
- PRs where the human essentially solved it and AI assisted with boilerplate
- What patterns emerge? When does AI add the most value?
- Categories of problems where AI struggled or failed
- Specific examples with explanation of why
- Did difficulty correlate with problem type, codebase area, or something else?
- Categories of problems AI solved in one shot
- What made these amenable to AI assistance?
- Can we predict which issues will be easy vs. hard?
- PRs that were rejected upstream and why
- Cases where AI-generated code was incorrect in subtle ways
- Cases where the AI approach was fundamentally wrong
- Lessons learned from each failure
- Did working with AI help contributors learn Clang internals faster?
- Did the AI surface knowledge that would otherwise require Richard Smith-level expertise?
- How did contributor effectiveness change over the course of the project?
- Did newer models perform better on the same types of problems?
- Did contributor skill at prompting improve over time?
- What was the learning curve?
The final research paper should follow this approximate structure:
-
Title, Authors, Affiliations
- All team members as co-authors
- C++ Alliance affiliation
-
Abstract
- One paragraph: what we did, what we found, why it matters
-
Introduction
- The compiler implementation crisis (P3962 as evidence)
- Why AI-assisted contributions are worth studying
- Our approach: human-agentic workflow with full transparency
-
Background and Related Work
- AI-for-code research landscape
- The Beman Project and implementation gaps
- Prior work on AI-assisted open-source contributions
- Human-agentic workflow concept
-
Methodology
- Workflow description (reference CPPA0001)
- Tools and models used
- Data collection procedures
- Full disclosure of AI involvement
-
Results
- Contribution volume and merge rates
- Cost analysis
- Prompting complexity distribution
- Time metrics
- Bug classification breakdown
-
Case Studies
- 3-5 selected PRs spanning the spectrum:
- A clean one-shot fix
- A complex multi-turn collaboration
- A failure that was instructive
- A case with heavy human-AI interaction
- Each case study includes the actual prompts and key excerpts from the chat
- 3-5 selected PRs spanning the spectrum:
-
Discussion
- What worked and what didn't
- Cost-benefit analysis: is this economically viable?
- Implications for open-source compiler development
- Implications for AI-assisted development generally
- How much was the AI vs. how much was the human?
-
Threats to Validity
- Selection bias in which issues were attempted
- Team expertise as a confound
- Model capability changes during the study period
- Measurement limitations
-
Conclusion
- Summary of findings
- Recommendations for other projects
- Future work
-
Appendices
- Selected full chat transcripts
- Prompt templates used
- Complete metrics tables
- Tool configuration details
- Prioritize which issues and PRs to pursue
- Ensure technical quality of all contributions
- Provide case study narratives for complex fixes
- Review analysis for technical accuracy
- Record all review interactions with AI-generated patches
- Document cases where AI output required significant correction
- Provide perspective on AI vs. traditional workflow effectiveness
- Record all review interactions with AI-generated patches
- Document upstream community reception and sentiment
- Provide historical context (comparison to pre-AI contribution rates)
- Save every chat. Every prompt. Every interaction.
- Tag contributions with metadata (model, attempt count, time)
- Flag interesting cases: one-shots, failures, heavy human involvement
- Export data weekly
| Milestone | Target |
|---|---|
| Data collection begins | Immediately |
| First quarterly metrics review | Q2 2026 |
| CPPCon presentation prep | Q3 2026 |
| Draft paper complete | Q4 2026 |
| Paper submission | Q4 2026 / Q1 2027 |
The paper should be compelling enough that a reader finishes it and thinks:
- "AI-assisted compiler contributions clearly work at scale"
- "These people measured everything--I can trust these results"
- "I want to try this on my project"
The more detail we present, the better. No hand-waving. No "just trust us." Every claim backed by data. Every interaction traceable. Every cost accounted for.
This is science. Act accordingly.