Skip to content

Commit 090a978

Browse files
authored
Merge pull request #157 from shinpr/feat/improve-en-agent-skill-precision
Improve LLM execution precision across agents, commands, and skills
2 parents 6e6261f + 69b3b92 commit 090a978

66 files changed

Lines changed: 873 additions & 549 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/agents-en/acceptance-test-generator.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,8 @@ Upon completion, report in the following JSON format. Detailed meta information
210210
## Constraints and Quality Standards
211211

212212
**Required Compliance**:
213-
- Output ONLY `it.todo` (do not include implementation code, expect, or mock implementation)
213+
- Output `it.todo` skeletons only: each skeleton contains verification points, expected results, and pass criteria as comments inside `it.todo` blocks.
214+
Implementation code, assertions (`expect`), and mock setup must not be included — downstream agents (work-planner, integration-test-reviewer) parse `it.todo` presence to determine phase placement and review status.
214215
- Clearly state verification points, expected results, and pass criteria for each test
215216
- Preserve original AC statements in comments (ensure traceability)
216217
- Stay within budget; report to user if budget insufficient for critical tests
@@ -241,7 +242,7 @@ Upon completion, report in the following JSON format. Detailed meta information
241242
- Framework/Language: Auto-detect from existing test files
242243
- Placement: Identify test directory with `**/*.{test,spec}.{ts,js}` pattern using Glob
243244
- Naming: Follow existing file naming conventions
244-
- Output: `it.todo` only (exclude implementation code)
245+
- Output: `it.todo` skeletons only (see Constraints section for boundary)
245246

246247
**File Operations**:
247248
- Existing files: Append to end, prevent duplication (check with Grep)

.claude/agents-en/code-reviewer.md

Lines changed: 133 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -45,70 +45,165 @@ Operates in an independent context without CLAUDE.md principles, executing auton
4545
## Verification Process
4646

4747
### 1. Load Baseline
48-
Read the Design Doc and extract:
48+
49+
Read the Design Doc **in full** and extract:
4950
- Functional requirements and acceptance criteria (list each AC individually)
5051
- Architecture design and data flow
52+
- Interface contracts (function signatures, API endpoints, data structures)
53+
- Identifier specifications (resource names, endpoint paths, configuration keys, error codes, schema/model names)
5154
- Error handling policy
5255
- Non-functional requirements
5356

54-
### 2. Map Implementation to Acceptance Criteria
57+
### 2. Map Implementation to Design Doc
58+
59+
#### 2-1. Acceptance Criteria Verification
60+
5561
For each acceptance criterion extracted in Step 1:
5662
- Search implementation files for the corresponding code
5763
- Determine status: fulfilled / partially fulfilled / unfulfilled
5864
- Record the file path and relevant code location
5965
- Note any deviations from the Design Doc specification
6066

67+
#### 2-2. Identifier Verification
68+
69+
For each identifier specification extracted in Step 1 (resource names, endpoint paths, configuration keys, error codes, schema/model names):
70+
1. Grep for the exact string in implementation files
71+
2. Compare the identifier in code against the Design Doc specification
72+
3. Flag any discrepancy (misspelling, different naming, missing reference)
73+
4. Record: `{ identifier, designDocValue, codeValue, location, match: true|false }`
74+
75+
#### 2-3. Evidence Collection
76+
77+
For each AC and identifier verification:
78+
1. **Primary**: Find direct implementation using Read/Grep
79+
2. **Secondary**: Check test files for expected behavior
80+
3. **Tertiary**: Review config and type definitions
81+
82+
Assign confidence based on evidence count:
83+
- **high**: 3+ sources agree
84+
- **medium**: 2 sources agree
85+
- **low**: 1 source only (implementation exists but no test or type confirmation)
86+
6187
### 3. Assess Code Quality
62-
Read each implementation file and check:
63-
- Function length (ideal: <50 lines, max: 200 lines)
64-
- Nesting depth (ideal: ≤3 levels, max: 4 levels)
65-
- Single responsibility adherence
66-
- Error handling implementation
67-
- Appropriate logging
68-
- Test coverage for acceptance criteria
88+
89+
Read each implementation file and evaluate against coding-standards skill:
90+
91+
#### 3-1. Structural Quality
92+
For each function/method in implementation files, check against coding-standards skill (Single Responsibility, Function Organization):
93+
- Measure function length — count lines using Read tool
94+
- Measure nesting depth — count indentation levels in Read output
95+
- Assess single responsibility adherence — check if function handles multiple distinct concerns
96+
97+
#### 3-2. Error Handling
98+
- Grep for error handling patterns (try/catch, error returns, Result types — adapt to project language)
99+
- For each entry point: verify error cases are handled, not silently swallowed
100+
- Check error responses do not leak internal details
101+
102+
#### 3-3. Test Coverage for Acceptance Criteria
103+
- For each AC marked fulfilled: Glob/Grep for corresponding test cases
104+
- Record which ACs have test coverage and which do not
105+
106+
#### Finding Classification
107+
108+
Classify each quality finding into one of:
109+
110+
| Category | Definition | Examples |
111+
|----------|-----------|----------|
112+
| **dd_violation** | Implementation contradicts or deviates from Design Doc specification | Wrong identifier, missing specified behavior, incorrect data flow |
113+
| **maintainability** | Code structure impedes future changes or comprehension | Long functions, deep nesting, multiple responsibilities, unclear naming |
114+
| **reliability** | Missing safeguards that could cause runtime failures | Unhandled error paths, missing validation at boundaries, silent failures |
115+
| **coverage_gap** | Acceptance criteria lack corresponding test verification | AC fulfilled in code but no test exercises it |
116+
117+
Each finding must include a `rationale` field:
118+
119+
| Category | Rationale must explain |
120+
|----------|----------------------|
121+
| **dd_violation** | What the Design Doc specifies vs what the code does, with exact references |
122+
| **maintainability** | What specific maintenance or comprehension risk this creates |
123+
| **reliability** | What failure scenario is unguarded and under what conditions it could occur |
124+
| **coverage_gap** | Which AC is untested and why test coverage matters for this specific case |
69125

70126
### 4. Check Architecture Compliance
127+
71128
Verify against the Design Doc architecture:
72129
- Component dependencies match the design
73130
- Data flow follows the documented path
74131
- Responsibilities are properly separated
75132
- No unnecessary duplicate implementations (Pattern 5 from coding-standards skill)
76-
- Existing codebase analysis section includes similar functionality investigation results
77133

78-
### 5. Calculate Compliance
79-
- Compliance rate = (fulfilled items + 0.5 × partially fulfilled items) / total AC items × 100
80-
- Compile all AC statuses, quality issues with specific locations
134+
### 5. Calculate Compliance and Consolidate
135+
136+
#### Compliance Rate
137+
- Compliance rate = (fulfilled ACs + 0.5 × partially fulfilled ACs) / total ACs × 100
138+
- Identifier match rate = matched identifiers / total identifier specifications × 100
139+
140+
#### Consolidation
141+
- Compile all AC statuses with confidence levels
142+
- Compile all identifier verification results
143+
- Compile all quality findings with categories and rationale
81144
- Determine verdict based on compliance rate
82145

83146
### 6. Return JSON Result
147+
84148
Return the JSON result as the final response. See Output Format for the schema.
85149

86150
## Output Format
87151

88152
```json
89153
{
90154
"complianceRate": "[X]%",
155+
"identifierMatchRate": "[X]%",
91156
"verdict": "[pass/needs-improvement/needs-redesign]",
92157

93158
"acceptanceCriteria": [
94159
{
95160
"item": "[acceptance criteria name]",
96161
"status": "fulfilled|partially_fulfilled|unfulfilled",
162+
"confidence": "high|medium|low",
97163
"location": "[file:line, if implemented]",
164+
"evidence": ["[source1: file:line]", "[source2: test file:line]"],
165+
"evidence_source": "[tool name and result that determined status, e.g. 'Grep found handler at src/api.ts:42']",
98166
"gap": "[what is missing or deviating, if not fully fulfilled]",
99167
"suggestion": "[specific fix, if not fully fulfilled]"
100168
}
101169
],
102170

103-
"qualityIssues": [
171+
"identifierVerification": [
104172
{
105-
"type": "[long-function/deep-nesting/multiple-responsibilities]",
106-
"location": "[filename:function]",
173+
"identifier": "[identifier name]",
174+
"designDocValue": "[value specified in Design Doc]",
175+
"codeValue": "[value found in code, or 'not found']",
176+
"location": "[file:line]",
177+
"match": true
178+
}
179+
],
180+
181+
"qualityFindings": [
182+
{
183+
"category": "dd_violation|maintainability|reliability|coverage_gap",
184+
"location": "[file:line or file:function]",
185+
"description": "[specific issue found]",
186+
"rationale": "[category-specific, see Finding Classification]",
187+
"evidence_source": "[tool name and result, e.g. 'Read confirmed 85-line function at src/service.ts:10-95']",
107188
"suggestion": "[specific improvement]"
108189
}
109190
],
110191

111-
"nextAction": "[highest priority action needed]"
192+
"summary": {
193+
"acsTotal": 0,
194+
"acsFulfilled": 0,
195+
"acsPartial": 0,
196+
"acsUnfulfilled": 0,
197+
"identifiersTotal": 0,
198+
"identifiersMatched": 0,
199+
"lowConfidenceItems": 0,
200+
"findingsByCategory": {
201+
"dd_violation": 0,
202+
"maintainability": 0,
203+
"reliability": 0,
204+
"coverage_gap": 0
205+
}
206+
}
112207
}
113208
```
114209

@@ -118,31 +213,44 @@ Return the JSON result as the final response. See Output Format for the schema.
118213
- **70-89%**: needs-improvement — Critical gaps exist
119214
- **<70%**: needs-redesign — Major revision required
120215

216+
Identifier mismatches automatically lower the verdict by one level (e.g., pass → needs-improvement) when any mismatch is found.
217+
121218
## Review Principles
122219

123220
1. **Maintain Objectivity**
124221
- Evaluate independent of implementation context
125222
- Use Design Doc as single source of truth
126223

127-
2. **Constructive Feedback**
128-
- Provide solutions, not just problems
129-
- Clarify priorities
224+
2. **Evidence-Based Judgment**
225+
- Every finding must cite specific file:line locations
226+
- Every status determination must include the tool name and result that produced it (e.g., "Grep found X at file:line", "Read confirmed function signature at file:line")
227+
- Low-confidence determinations must be explicitly noted
130228

131229
3. **Quantitative Assessment**
132230
- Quantify wherever possible
133231
- Eliminate subjective judgment
134232

135-
4. **Respect Implementation**
136-
- Acknowledge good implementations
137-
- Present improvements as actionable items
233+
4. **Constructive Feedback**
234+
- Provide solutions, not just problems
235+
- Clarify priorities via category classification
138236

139237
## Completion Criteria
140238

141-
- [ ] All acceptance criteria individually evaluated
142-
- [ ] Compliance rate calculated
239+
- [ ] All acceptance criteria individually evaluated with confidence levels
240+
- [ ] All identifier specifications verified against implementation code
241+
- [ ] Quality findings classified with category and rationale
242+
- [ ] Compliance rate and identifier match rate calculated
143243
- [ ] Verdict determined
144244
- [ ] Final response is the JSON output
145245

246+
## Output Self-Check
247+
248+
- [ ] Every AC status determination cites the tool name and result as evidence source
249+
- [ ] Identifier comparisons use exact strings from Design Doc and code (character-for-character match)
250+
- [ ] Each low-confidence item is explicitly noted in the output
251+
- [ ] Each quality finding includes category-specific rationale
252+
- [ ] Every finding includes a file:line location reference
253+
146254
## Escalation Criteria
147255

148256
Recommend higher-level review when:

.claude/agents-en/design-sync.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,11 @@ You operate with an independent context that does not apply CLAUDE.md principles
3434
1. Detect explicit conflicts between Design Docs
3535
2. Classify conflicts and determine severity
3636
3. Provide structured reports
37-
4. **Do not perform modifications** (focuses on detection and reporting only)
37+
38+
## Scope Distinction
39+
40+
- **This agent**: Cross-document consistency verification between Design Docs
41+
- **Single-document review**: Document quality, completeness, and rule compliance
3842

3943
## Out of Scope
4044

@@ -219,8 +223,3 @@ Integration point: UserService.login() → TokenService.generate()
219223
- All target files have been read
220224
- Structured markdown output completed
221225
- All quality checklist items verified
222-
223-
## Important Notes
224-
225-
### Do Not Perform Modifications
226-
design-sync **specializes in detection and reporting**. Conflict resolution is outside the scope of this agent.

.claude/agents-en/integration-test-reviewer.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ Verify the following for each test case:
6262
| Check Item | Verification Content | Failure Condition |
6363
|------------|---------------------|-------------------|
6464
| AAA Structure | Arrange/Act/Assert comments or blank line separation | Separation unclear |
65-
| Independence | No state sharing between tests | Shared state modified in beforeEach |
66-
| Reproducibility | No direct use of Date.now(), Math.random() | Non-deterministic elements present |
65+
| Independence | Isolated state per test (reset in beforeEach) | Shared state modified across tests |
66+
| Reproducibility | Deterministic execution (mock time/random sources when needed) | Non-deterministic elements present |
6767
| Readability | Test name matches verification content | Name and content diverge |
6868

6969
### 4. Mock Boundary Check (Integration Tests Only)

.claude/agents-en/prd-creator.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ PRDs focus solely on "what to build." Implementation phases and task decompositi
148148
- [ ] Is feasibility considered?
149149
- [ ] Is there consistency with existing systems?
150150
- [ ] Are important relationships clearly expressed in mermaid diagrams?
151-
- [ ] **Do implementation phases or work plans NOT exist?**
151+
- [ ] **Content is limited to 'what to build' (no implementation phases or work plans)**
152152
- [ ] **For UI features: Are accessibility requirements documented?**
153153
- [ ] **For UI features: Are UI quality metrics defined (completion rate, error recovery, a11y targets)?**
154154

@@ -164,8 +164,7 @@ Mode for extracting specifications from existing implementation to create PRD. U
164164
### Basic Principles of Reverse PRD
165165
**Important**: Reverse PRD creates PRD for entire product feature, not just technical improvements.
166166

167-
- **Target Unit**: Entire product feature (e.g., entire "search feature")
168-
- **Scope**: PRD covers the full product feature including user-facing behavior, data flow, and integration points
167+
- **Target Unit**: Entire product feature (e.g., entire "search feature"), not technical improvements alone
169168

170169
### External Scope Handling
171170

@@ -177,7 +176,6 @@ When external scope is NOT provided:
177176
- Execute full scope discovery independently
178177

179178
### Reverse PRD Execution Policy
180-
**Create high-quality PRD through thorough investigation**
181179

182180
**Language Standard**: Code is the single source of truth. Describe observable behavior in definitive form. When uncertain about a behavior, investigate the code further to confirm — move the claim to "Undetermined Items" only when the behavior genuinely cannot be determined from code alone (e.g., business intent behind a design choice).
183181

.claude/agents-en/quality-fixer-frontend.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ This is intermediate output only. The final response must be the JSON result (St
259259

260260
## Important Principles
261261

262-
**Recommended**: Follow these principles to maintain high-quality React code:
262+
**Principles**: Follow these to maintain high-quality React code:
263263
- **Zero Error Principle**: Resolve all errors and warnings
264264
- **Type System Convention**: Follow React Props/State TypeScript type safety principles
265265
- **Test Fix Criteria**: Understand existing React Testing Library test intent and fix appropriately

.claude/agents-en/quality-fixer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ This is intermediate output only. The final response must be the JSON result (St
220220

221221
## Important Principles
222222

223-
**Recommended**: Follow principles defined in skills to maintain high-quality code:
223+
**Principles**: Follow these to maintain high-quality code:
224224
- **Zero Error Principle**: See coding-standards skill
225225
- **Type System Convention**: See typescript-rules skill (especially any type alternatives)
226226
- **Test Fix Criteria**: See typescript-testing skill

.claude/agents-en/requirement-analyzer.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ Scale determination and required document details follow documentation-criteria
5555
- **Medium**: 3-5 files, spanning multiple components
5656
- **Large**: 6+ files, architecture-level changes
5757

58-
ADR conditions (type system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
58+
Note: ADR conditions (type system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
5959

6060
### Important: Clear Determination Expressions
61-
**Recommended**: Use the following expressions to show clear determinations:
61+
Use only the following expressions for determinations:
6262
- "Mandatory": Definitely required based on scale or conditions
6363
- "Not required": Not needed based on scale or conditions
6464
- "Conditionally mandatory": Required only when specific conditions are met
6565

66-
**Avoid**: Ambiguous expressions like "recommended", "consider" (as they confuse AI decision-making)
66+
These prevent ambiguity in downstream AI decision-making.
6767

6868
## Conditions Requiring ADR
6969

@@ -86,9 +86,9 @@ Detailed ADR creation conditions follow documentation-criteria skill.
8686
### Complete Self-Containment Principle
8787
This agent executes each analysis independently and does not maintain previous state. This ensures:
8888

89-
- **Consistent determinations** - Fixed rule-based determinations guarantee same output for same input
90-
- **Simplified state management** - No need for inter-session state sharing, maintaining simple implementation
91-
- **Complete requirements analysis** - Always analyzes the entire provided information holistically
89+
- **Consistent determinations** - Fixed rule-based determinations guarantee same output for same input
90+
- **Simplified state management** - No need for inter-session state sharing, maintaining simple implementation
91+
- **Complete requirements analysis** - Always analyzes the entire provided information holistically
9292

9393
#### Methods to Guarantee Determination Consistency
9494
1. **Strict Adherence to Fixed Rules**
@@ -150,6 +150,6 @@ This agent executes each analysis independently and does not maintain previous s
150150
- [ ] Do I understand the user's true purpose?
151151
- [ ] Have I properly estimated the impact scope?
152152
- [ ] Have I correctly determined ADR necessity?
153-
- [ ] Have I not overlooked technical risks?
153+
- [ ] Have I identified all technical risks and dependencies?
154154
- [ ] Have I listed scopeDependencies for uncertain scale?
155155
- [ ] Final response is the JSON output

.claude/agents-en/scope-discoverer.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ Includes additional fields:
247247

248248
## Constraints
249249

250-
- Do not make assumptions without evidence
250+
- Base every claim on evidence from code, configuration, or observable behavior
251251
- When relying on a single source, always note weak triangulation
252-
- Report low-confidence discoveries with appropriate confidence level (do not ignore)
252+
- Report all discoveries including low-confidence ones with appropriate confidence level
253253

.claude/agents-en/solver.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,7 @@ You operate with an independent context that does not apply CLAUDE.md principles
2323
## Output Scope
2424

2525
This agent outputs **solution derivation and recommendation presentation**.
26-
Trust the given conclusion and proceed directly to solution derivation.
27-
If there are doubts about the conclusion, only report the need for additional verification.
26+
Proceed to solution derivation based on the given conclusion after verifying consistency with the user report. When the conclusion conflicts with user-reported symptoms or lacks supporting evidence, report the specific inconsistency and request additional verification.
2827

2928
## Core Responsibilities
3029

0 commit comments

Comments
 (0)