You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/d4034-wg21-sage.md
+21-8Lines changed: 21 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,8 @@ audience: WG21
13
13
14
14
WG21 participants have accumulated deep expertise over three decades of C++ standardization. This expertise includes not just technical knowledge but judgment - the ability to evaluate proposals, recognize patterns, and make good decisions in novel situations. Much of this judgment is tacit: easier to demonstrate than to write down.
15
15
16
+
[P4023R0](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4023r0.pdf)<sup>[18]</sup> (Directions Group, "Strategic Direction for AI in C++") identifies a critical gap in AI training data and calls on the ecosystem to build an "ImageNet for C++." That paper focuses on code quality. This paper addresses a complementary dimension: the institutional judgment that experienced practitioners apply when evaluating whether a proposal belongs in the standard at all.
17
+
16
18
This paper presents a method for capturing expert judgment through structured interviews, AI-assisted transcription, and knowledge synthesis. We conducted interviews with experienced committee members and processed the results through an agentic workflow. The output is a structured collection of principles and experiences that can be shared, reviewed, and applied.
17
19
18
20
The technology exists. The methodology is demonstrated. Participation is voluntary.
@@ -31,7 +33,9 @@ The technology exists. The methodology is demonstrated. Participation is volunta
31
33
32
34
## 1. Disclosure
33
35
34
-
**This paper uses AI at every stage.** Interview transcripts were produced by AI transcription. Knowledge synthesis was produced by AI processing. The paper itself was drafted with AI assistance. Every stage involves machine output.
36
+
**The author is the intelligence of record.** P4023R0<sup>[18]</sup> establishes that "the ultimate responsibility for accuracy, logic, and normative quality rests entirely with the human author." This paper follows that principle. AI tools assist with transcription, synthesis, and drafting. The author curates, verifies, and takes responsibility for every claim.
37
+
38
+
**This paper uses AI at every stage.** Interview transcripts were produced by AI transcription. Knowledge synthesis was produced by AI processing. The paper itself was drafted with AI assistance. Every stage involves machine output. P4023R0 identifies research, summarization, and consistency checking as permitted uses of AI in the committee process<sup>[18]</sup>. This paper's use of AI falls within that scope.
35
39
36
40
**Human curation is required at every stage.** AI transcription introduces errors. AI synthesis can misattribute, compress, or distort meaning. No output in this paper should be treated as a faithful representation of any interviewee's views without that interviewee's explicit review and approval.
37
41
@@ -106,7 +110,7 @@ Consider SD-9, which says things like "use `[[nodiscard]]` for functions where i
106
110
107
111
SD-10 comes closest to real knowledge transfer by referencing "Design and Evolution of C++" principles. But the references are brief, newcomers may not have read D&E, and there is no explanation of how to apply principles to novel cases.
108
112
109
-
P2000 articulates the right philosophy and goals. The methodology presented in this paper complements that work by capturing the evaluative judgment that experienced participants apply when assessing whether a proposal meets those goals.
113
+
P2000 articulates the right philosophy and goals. The Directions Group's P4023R0<sup>[18]</sup> identifies the same gap from the AI perspective: current models are trained on legacy code and unsafe patterns, and the ecosystem needs "a curated, human validated collection" of high-quality C++ knowledge. P4023R0 focuses on code; the methodology presented in this paper addresses the complementary dimension - the evaluative judgment that experienced participants apply when assessing whether a proposal meets those goals.
110
114
111
115
The generating principles - how to *think* about API design, how to recognize patterns of failure, how to evaluate whether a proposal belongs in the standard at all - are held by experienced participants. These principles can be captured. The next sections describe a method for doing so.
112
116
@@ -345,7 +349,7 @@ Start with a general question, then use the response to drill down into a relata
345
349
346
350
### 4.4 How AI Enables This Now
347
351
348
-
Modern AI capabilities make this project feasible in ways that were not possible even a few years ago:
352
+
Modern AI capabilities make this project feasible in ways that were not possible even a few years ago. P4023R0<sup>[18]</sup> identifies research, summarizing unfamiliar domains, and checking consistency as appropriate uses of AI within the committee process. The methodology described here uses AI for exactly those purposes - transcription, synthesis, and structured extraction - with human experts providing the source material and reviewing the output:
349
353
350
354
-**High-quality transcription**: Accurate speech-to-text for technical conversations
351
355
-**Synthesis across interviews**: Identifying common themes and principles from multiple sources
@@ -376,7 +380,7 @@ The inversion reframes any concern about displacement:
376
380
-**Comparative advantage shifts**: Experts focus on judgment rather than production. Howard Hinnant's value lies in knowing which library proposals lack sufficient field experience, not in typing out his reasoning. The AI handles transcription and synthesis; the expert provides the irreplaceable judgment.
377
381
-**Capability expansion**: More people can contribute meaningfully. An expert who might never write a paper can share insights through a one-hour interview. The total knowledge captured increases even as individual time requirements decrease.
378
382
379
-
The economics are clear: judgment is the bottleneck owned by experts. This methodology amplifies their role.
383
+
The economics are clear: judgment is the bottleneck owned by experts. This methodology amplifies their role. P4023R0's governance principle - "the author is the intelligence of record" - arrives at the same conclusion from the policy direction: human judgment is irreplaceable, and AI is a tool in its service<sup>[18]</sup>.
380
384
381
385
---
382
386
@@ -497,9 +501,9 @@ They describe carrying these lessons forward to Swift. Chris Lattner deliberatel
### 5.6 From Interviews to Corroborated Principles
501
505
502
-
Individual knowledge files capture one expert's perspective. Greater value emerges when these files are combined and distilled into an evaluation instrument. We built a three-stage agentic pipeline that transforms interview transcripts into a paper-scoring model. Each stage is driven by a rule file - an AI prompt that defines the transformation. All three rule files were themselves generated by prompting an AI agent.
506
+
Individual knowledge files capture one expert's perspective. Greater value emerges when these files are combined and distilled into a shared set of corroborated principles - statements that multiple independent experts arrived at from different experiences. We built a three-stage agentic pipeline that transforms interview transcripts into structured principles. Each stage is driven by a rule file - an AI prompt that defines the transformation. All three rule files were themselves generated by prompting an AI agent.
**Stage 1 - Capture.**[WG21_CAPTURE.md](https://github.com/cppalliance/wg21-sage/blob/master/rules/WG21_CAPTURE.md)<sup>[16]</sup> is a knowledge extraction agent. Given an interview transcript, it produces a structured knowledge file containing principles (actionable rules with "When to Apply" conditions and "Red Flags" for violations) and experiences (supporting stories that illustrate and validate the principles). Each principle carries metadata: category, confidence level, and whether it applies to library proposals, language proposals, or both. We applied `WG21_CAPTURE` to each of the five transcripts in `inputs/`, producing five knowledge files in `knowledge/`.
@@ -527,6 +531,8 @@ flowchart TD
527
531
|[`WG21_MERGE.md`](https://github.com/cppalliance/wg21-sage/blob/master/rules/WG21_MERGE.md)| Multiple `*.know.md` files |`merged.know.md`| Retain only principles corroborated by 2+ independent sources |
528
532
|[`WG21_JUDGE.md`](https://github.com/cppalliance/wg21-sage/blob/master/rules/WG21_JUDGE.md)|`merged.know.md` + focus |`WG21_EVAL_*.md`| Generate a paper-scoring model from merged principles |
529
533
534
+
The primary contribution of this pipeline is the merged knowledge file - 11 principles corroborated by two or more independent experts. The evaluation model generated in Stage 3 is one illustrative downstream application, presented in Section 6 as a demonstration. It is experimental, requires human judgment to apply, and is not intended as an automated scoring system for committee papers.
535
+
530
536
---
531
537
532
538
## 6. Application: Self-Evaluation
@@ -535,7 +541,7 @@ To demonstrate the evaluation model in practice, the lead author applied `WG21_E
535
541
536
542
The paper scored **17/22** (passing threshold: 14/22). Six criteria received full marks: complexity awareness, implementation validation, external incubation, knowledge capture, enabling previously-impossible capabilities, and principled design. Five criteria scored partial: political fragility, proven practice (limited independent adoption), consensus collaboration (single-organization development), language-library boundary tensions, and licensing documentation.
537
543
538
-
Self-evaluation is inherently limited - the author cannot be objective about his own work. The purpose here is not to claim objectivity but to demonstrate the tool's operation. The evaluation model surfaces specific, actionable feedback (e.g., "document independent adoption", "state the license explicitly") that a self-evaluating author can act on before committee review. The real value of the tool will emerge when it is applied by others.
544
+
Self-evaluation is inherently limited - the author cannot be objective about his own work. The purpose here is not to claim objectivity but to demonstrate the tool's operation and, crucially, to show that the model identifies weaknesses in its creator's own paper. The evaluation model surfaces specific, actionable feedback (e.g., "document independent adoption", "state the license explicitly") that a self-evaluating author can act on before committee review. Consistent with P4023R0's governance principle, the model assists human judgment - the author remains the intelligence of record who decides which feedback to act on<sup>[18]</sup>. The real value of the tool will emerge when it is applied by others.
539
545
540
546
### 6.1 Reproducibility and Iteration
541
547
@@ -610,6 +616,8 @@ WG21 is a voluntary organization. No one can compel participation, enforce paper
610
616
611
617
Every institution accumulates tacit knowledge in the minds of experienced practitioners. Every institution benefits from making that knowledge explicit. WG21 is not unusual in facing this challenge. It is unusual in the depth of expertise available to capture.
612
618
619
+
The Directions Group's P4023R0<sup>[18]</sup> calls on the ecosystem to build an "ImageNet for C++" - a curated, human-validated knowledge base. That paper focuses on code quality. This paper demonstrates that the same approach applies to institutional judgment: the principles experienced practitioners use to evaluate proposals, recognize patterns of failure, and make good decisions in novel situations. The knowledge capture workflow presented here is one answer to the Directions Group's challenge.
620
+
613
621
**What you can do:**
614
622
615
623
-**Experienced WG21 participants**: Contact the paper authors to share your knowledge through an interview. Your insights about design principles, historical decisions, and evaluation frameworks are the raw material.
@@ -657,11 +665,14 @@ Thanks to all interview participants for sharing their expertise.
16. WG21-SAGE: Transcripts, knowledge files, and agentic rules. https://github.com/cppalliance/wg21-sage
659
667
17. Falco, Gerbino, Gill. P4003R0: Coroutines for I/O. https://wg21.link/p4003r0
668
+
18. Garland, McKenney, Orr, Stroustrup, Vandevoorde, Wong. [P4023R0](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4023r0.pdf): "Strategic Direction for AI in C++: Governance, and Ecosystem" (Directions Group, 2026). https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4023r0.pdf
660
669
661
670
---
662
671
663
672
## Appendix A: WG21 General Evaluation Model
664
673
674
+
> **Note:** This model is presented as an illustrative output of the knowledge capture methodology. It is experimental, reflects one run of a preliminary pipeline, and requires human judgment to apply. It is not a recommendation that the committee adopt automated paper scoring.
675
+
665
676
This model evaluates C++ standardization proposals against principles of proven practice, political viability, design coherence, implementation validation, complexity control, institutional knowledge, and ecosystem fit derived from experienced WG21 practitioners.
## Appendix B: Evaluation of P4003R0 "Coroutines for I/O"
946
957
958
+
> **Note:** This evaluation demonstrates the model's operation on the lead author's own paper. The model identified weaknesses (political fragility, single-organization development, missing license documentation) that the author can act on. The evaluation is illustrative. The author is the intelligence of record.
959
+
947
960
**Model**: `rules/WG21_EVAL_GENERAL.md`
948
961
**Paper**: P4003R0 (D4003, 2026-02-22)
949
962
**Authors**: Vinnie Falco, Steve Gerbino, Mungo Gill
0 commit comments