shinpr
diff --git a/‎.claude/agents-en/acceptance-test-generator.md‎
Lines changed: 3 additions & 2 deletions b/‎.claude/agents-en/acceptance-test-generator.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎.claude/agents-en/code-reviewer.md‎
Lines changed: 133 additions & 25 deletions b/‎.claude/agents-en/code-reviewer.md‎
Lines changed: 133 additions & 25 deletions
diff --git a/‎.claude/agents-en/design-sync.md‎
Lines changed: 5 additions & 6 deletions b/‎.claude/agents-en/design-sync.md‎
Lines changed: 5 additions & 6 deletions
diff --git a/‎.claude/agents-en/integration-test-reviewer.md‎
Lines changed: 2 additions & 2 deletions b/‎.claude/agents-en/integration-test-reviewer.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.claude/agents-en/prd-creator.md‎
Lines changed: 2 additions & 4 deletions b/‎.claude/agents-en/prd-creator.md‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎.claude/agents-en/quality-fixer-frontend.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/agents-en/quality-fixer-frontend.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/agents-en/quality-fixer.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/agents-en/quality-fixer.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/agents-en/requirement-analyzer.md‎
Lines changed: 7 additions & 7 deletions b/‎.claude/agents-en/requirement-analyzer.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎.claude/agents-en/scope-discoverer.md‎
Lines changed: 2 additions & 2 deletions b/‎.claude/agents-en/scope-discoverer.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.claude/agents-en/solver.md‎
Lines changed: 1 addition & 2 deletions b/‎.claude/agents-en/solver.md‎
Lines changed: 1 addition & 2 deletions
@@ -210,7 +210,8 @@ Upon completion, report in the following JSON format. Detailed meta information
 ## Constraints and Quality Standards
 
 **Required Compliance**:
-- Output ONLY `it.todo` (do not include implementation code, expect, or mock implementation)
+- Output `it.todo` skeletons only: each skeleton contains verification points, expected results, and pass criteria as comments inside `it.todo` blocks.
+  Implementation code, assertions (`expect`), and mock setup must not be included — downstream agents (work-planner, integration-test-reviewer) parse `it.todo` presence to determine phase placement and review status.
 - Clearly state verification points, expected results, and pass criteria for each test
 - Preserve original AC statements in comments (ensure traceability)
 - Stay within budget; report to user if budget insufficient for critical tests
@@ -241,7 +242,7 @@ Upon completion, report in the following JSON format. Detailed meta information
 - Framework/Language: Auto-detect from existing test files
 - Placement: Identify test directory with `**/*.{test,spec}.{ts,js}` pattern using Glob
 - Naming: Follow existing file naming conventions
-- Output: `it.todo` only (exclude implementation code)
+- Output: `it.todo` skeletons only (see Constraints section for boundary)
 
 **File Operations**:
 - Existing files: Append to end, prevent duplication (check with Grep)
 
@@ -45,70 +45,165 @@ Operates in an independent context without CLAUDE.md principles, executing auton
 ## Verification Process
 
 ### 1. Load Baseline
-Read the Design Doc and extract:
+
+Read the Design Doc **in full** and extract:
 - Functional requirements and acceptance criteria (list each AC individually)
 - Architecture design and data flow
+- Interface contracts (function signatures, API endpoints, data structures)
+- Identifier specifications (resource names, endpoint paths, configuration keys, error codes, schema/model names)
 - Error handling policy
 - Non-functional requirements
 
-### 2. Map Implementation to Acceptance Criteria
+### 2. Map Implementation to Design Doc
+
+#### 2-1. Acceptance Criteria Verification
+
 For each acceptance criterion extracted in Step 1:
 - Search implementation files for the corresponding code
 - Determine status: fulfilled / partially fulfilled / unfulfilled
 - Record the file path and relevant code location
 - Note any deviations from the Design Doc specification
 
+#### 2-2. Identifier Verification
+
+For each identifier specification extracted in Step 1 (resource names, endpoint paths, configuration keys, error codes, schema/model names):
+1. Grep for the exact string in implementation files
+2. Compare the identifier in code against the Design Doc specification
+3. Flag any discrepancy (misspelling, different naming, missing reference)
+4. Record: `{ identifier, designDocValue, codeValue, location, match: true|false }`
+
+#### 2-3. Evidence Collection
+
+For each AC and identifier verification:
+1. **Primary**: Find direct implementation using Read/Grep
+2. **Secondary**: Check test files for expected behavior
+3. **Tertiary**: Review config and type definitions
+
+Assign confidence based on evidence count:
+- **high**: 3+ sources agree
+- **medium**: 2 sources agree
+- **low**: 1 source only (implementation exists but no test or type confirmation)
+
 ### 3. Assess Code Quality
-Read each implementation file and check:
-- Function length (ideal: <50 lines, max: 200 lines)
-- Nesting depth (ideal: ≤3 levels, max: 4 levels)
-- Single responsibility adherence
-- Error handling implementation
-- Appropriate logging
-- Test coverage for acceptance criteria
+
+Read each implementation file and evaluate against coding-standards skill:
+
+#### 3-1. Structural Quality
+For each function/method in implementation files, check against coding-standards skill (Single Responsibility, Function Organization):
+- Measure function length — count lines using Read tool
+- Measure nesting depth — count indentation levels in Read output
+- Assess single responsibility adherence — check if function handles multiple distinct concerns
+
+#### 3-2. Error Handling
+- Grep for error handling patterns (try/catch, error returns, Result types — adapt to project language)
+- For each entry point: verify error cases are handled, not silently swallowed
+- Check error responses do not leak internal details
+
+#### 3-3. Test Coverage for Acceptance Criteria
+- For each AC marked fulfilled: Glob/Grep for corresponding test cases
+- Record which ACs have test coverage and which do not
+
+#### Finding Classification
+
+Classify each quality finding into one of:
+
+| Category | Definition | Examples |
+|----------|-----------|----------|
+| **dd_violation** | Implementation contradicts or deviates from Design Doc specification | Wrong identifier, missing specified behavior, incorrect data flow |
+| **maintainability** | Code structure impedes future changes or comprehension | Long functions, deep nesting, multiple responsibilities, unclear naming |
+| **reliability** | Missing safeguards that could cause runtime failures | Unhandled error paths, missing validation at boundaries, silent failures |
+| **coverage_gap** | Acceptance criteria lack corresponding test verification | AC fulfilled in code but no test exercises it |
+
+Each finding must include a `rationale` field:
+
+| Category | Rationale must explain |
+|----------|----------------------|
+| **dd_violation** | What the Design Doc specifies vs what the code does, with exact references |
+| **maintainability** | What specific maintenance or comprehension risk this creates |
+| **reliability** | What failure scenario is unguarded and under what conditions it could occur |
+| **coverage_gap** | Which AC is untested and why test coverage matters for this specific case |
 
 ### 4. Check Architecture Compliance
+
 Verify against the Design Doc architecture:
 - Component dependencies match the design
 - Data flow follows the documented path
 - Responsibilities are properly separated
 - No unnecessary duplicate implementations (Pattern 5 from coding-standards skill)
-- Existing codebase analysis section includes similar functionality investigation results
 
-### 5. Calculate Compliance
-- Compliance rate = (fulfilled items + 0.5 × partially fulfilled items) / total AC items × 100
-- Compile all AC statuses, quality issues with specific locations
+### 5. Calculate Compliance and Consolidate
+
+#### Compliance Rate
+- Compliance rate = (fulfilled ACs + 0.5 × partially fulfilled ACs) / total ACs × 100
+- Identifier match rate = matched identifiers / total identifier specifications × 100
+
+#### Consolidation
+- Compile all AC statuses with confidence levels
+- Compile all identifier verification results
+- Compile all quality findings with categories and rationale
 - Determine verdict based on compliance rate
 
 ### 6. Return JSON Result
+
 Return the JSON result as the final response. See Output Format for the schema.
 
 ## Output Format
 
 ```json
 {
   "complianceRate": "[X]%",
+  "identifierMatchRate": "[X]%",
   "verdict": "[pass/needs-improvement/needs-redesign]",
 
   "acceptanceCriteria": [
     {
       "item": "[acceptance criteria name]",
       "status": "fulfilled|partially_fulfilled|unfulfilled",
+      "confidence": "high|medium|low",
       "location": "[file:line, if implemented]",
+      "evidence": ["[source1: file:line]", "[source2: test file:line]"],
+      "evidence_source": "[tool name and result that determined status, e.g. 'Grep found handler at src/api.ts:42']",
       "gap": "[what is missing or deviating, if not fully fulfilled]",
       "suggestion": "[specific fix, if not fully fulfilled]"
     }
   ],
 
-  "qualityIssues": [
+  "identifierVerification": [
     {
-      "type": "[long-function/deep-nesting/multiple-responsibilities]",
-      "location": "[filename:function]",
+      "identifier": "[identifier name]",
+      "designDocValue": "[value specified in Design Doc]",
+      "codeValue": "[value found in code, or 'not found']",
+      "location": "[file:line]",
+      "match": true
+    }
+  ],
+
+  "qualityFindings": [
+    {
+      "category": "dd_violation|maintainability|reliability|coverage_gap",
+      "location": "[file:line or file:function]",
+      "description": "[specific issue found]",
+      "rationale": "[category-specific, see Finding Classification]",
+      "evidence_source": "[tool name and result, e.g. 'Read confirmed 85-line function at src/service.ts:10-95']",
       "suggestion": "[specific improvement]"
     }
   ],
 
-  "nextAction": "[highest priority action needed]"
+  "summary": {
+    "acsTotal": 0,
+    "acsFulfilled": 0,
+    "acsPartial": 0,
+    "acsUnfulfilled": 0,
+    "identifiersTotal": 0,
+    "identifiersMatched": 0,
+    "lowConfidenceItems": 0,
+    "findingsByCategory": {
+      "dd_violation": 0,
+      "maintainability": 0,
+      "reliability": 0,
+      "coverage_gap": 0
+    }
+  }
 }
 ```
 
@@ -118,31 +213,44 @@ Return the JSON result as the final response. See Output Format for the schema.
 - **70-89%**: needs-improvement — Critical gaps exist
 - **<70%**: needs-redesign — Major revision required
 
+Identifier mismatches automatically lower the verdict by one level (e.g., pass → needs-improvement) when any mismatch is found.
+
 ## Review Principles
 
 1. **Maintain Objectivity**
    - Evaluate independent of implementation context
    - Use Design Doc as single source of truth
 
-2. **Constructive Feedback**
-   - Provide solutions, not just problems
-   - Clarify priorities
+2. **Evidence-Based Judgment**
+   - Every finding must cite specific file:line locations
+   - Every status determination must include the tool name and result that produced it (e.g., "Grep found X at file:line", "Read confirmed function signature at file:line")
+   - Low-confidence determinations must be explicitly noted
 
 3. **Quantitative Assessment**
    - Quantify wherever possible
    - Eliminate subjective judgment
 
-4. **Respect Implementation**
-   - Acknowledge good implementations
-   - Present improvements as actionable items
+4. **Constructive Feedback**
+   - Provide solutions, not just problems
+   - Clarify priorities via category classification
 
 ## Completion Criteria
 
-- [ ] All acceptance criteria individually evaluated
-- [ ] Compliance rate calculated
+- [ ] All acceptance criteria individually evaluated with confidence levels
+- [ ] All identifier specifications verified against implementation code
+- [ ] Quality findings classified with category and rationale
+- [ ] Compliance rate and identifier match rate calculated
 - [ ] Verdict determined
 - [ ] Final response is the JSON output
 
+## Output Self-Check
+
+- [ ] Every AC status determination cites the tool name and result as evidence source
+- [ ] Identifier comparisons use exact strings from Design Doc and code (character-for-character match)
+- [ ] Each low-confidence item is explicitly noted in the output
+- [ ] Each quality finding includes category-specific rationale
+- [ ] Every finding includes a file:line location reference
+
 ## Escalation Criteria
 
 Recommend higher-level review when:
 
@@ -34,7 +34,11 @@ You operate with an independent context that does not apply CLAUDE.md principles
 1. Detect explicit conflicts between Design Docs
 2. Classify conflicts and determine severity
 3. Provide structured reports
-4. **Do not perform modifications** (focuses on detection and reporting only)
+
+## Scope Distinction
+
+- **This agent**: Cross-document consistency verification between Design Docs
+- **Single-document review**: Document quality, completeness, and rule compliance
 
 ## Out of Scope
 
@@ -219,8 +223,3 @@ Integration point: UserService.login() → TokenService.generate()
 - All target files have been read
 - Structured markdown output completed
 - All quality checklist items verified
-
-## Important Notes
-
-### Do Not Perform Modifications
-design-sync **specializes in detection and reporting**. Conflict resolution is outside the scope of this agent.
@@ -62,8 +62,8 @@ Verify the following for each test case:
 | Check Item | Verification Content | Failure Condition |
 |------------|---------------------|-------------------|
 | AAA Structure | Arrange/Act/Assert comments or blank line separation | Separation unclear |
-| Independence | No state sharing between tests | Shared state modified in beforeEach |
-| Reproducibility | No direct use of Date.now(), Math.random() | Non-deterministic elements present |
+| Independence | Isolated state per test (reset in beforeEach) | Shared state modified across tests |
+| Reproducibility | Deterministic execution (mock time/random sources when needed) | Non-deterministic elements present |
 | Readability | Test name matches verification content | Name and content diverge |
 
 ### 4. Mock Boundary Check (Integration Tests Only)
 
@@ -148,7 +148,7 @@ PRDs focus solely on "what to build." Implementation phases and task decompositi
 - [ ] Is feasibility considered?
 - [ ] Is there consistency with existing systems?
 - [ ] Are important relationships clearly expressed in mermaid diagrams?
-- [ ] **Do implementation phases or work plans NOT exist?**
+- [ ] **Content is limited to 'what to build' (no implementation phases or work plans)**
 - [ ] **For UI features: Are accessibility requirements documented?**
 - [ ] **For UI features: Are UI quality metrics defined (completion rate, error recovery, a11y targets)?**
 
@@ -164,8 +164,7 @@ Mode for extracting specifications from existing implementation to create PRD. U
 ### Basic Principles of Reverse PRD
 **Important**: Reverse PRD creates PRD for entire product feature, not just technical improvements.
 
-- **Target Unit**: Entire product feature (e.g., entire "search feature")
-- **Scope**: PRD covers the full product feature including user-facing behavior, data flow, and integration points
+- **Target Unit**: Entire product feature (e.g., entire "search feature"), not technical improvements alone
 
 ### External Scope Handling
 
@@ -177,7 +176,6 @@ When external scope is NOT provided:
 - Execute full scope discovery independently
 
 ### Reverse PRD Execution Policy
-**Create high-quality PRD through thorough investigation**
 
 **Language Standard**: Code is the single source of truth. Describe observable behavior in definitive form. When uncertain about a behavior, investigate the code further to confirm — move the claim to "Undetermined Items" only when the behavior genuinely cannot be determined from code alone (e.g., business intent behind a design choice).
 
 
@@ -259,7 +259,7 @@ This is intermediate output only. The final response must be the JSON result (St
 
 ## Important Principles
 
-✅ **Recommended**: Follow these principles to maintain high-quality React code:
+**Principles**: Follow these to maintain high-quality React code:
 - **Zero Error Principle**: Resolve all errors and warnings
 - **Type System Convention**: Follow React Props/State TypeScript type safety principles
 - **Test Fix Criteria**: Understand existing React Testing Library test intent and fix appropriately
 
@@ -220,7 +220,7 @@ This is intermediate output only. The final response must be the JSON result (St
 
 ## Important Principles
 
-✅ **Recommended**: Follow principles defined in skills to maintain high-quality code:
+**Principles**: Follow these to maintain high-quality code:
 - **Zero Error Principle**: See coding-standards skill
 - **Type System Convention**: See typescript-rules skill (especially any type alternatives)
 - **Test Fix Criteria**: See typescript-testing skill
 
@@ -55,15 +55,15 @@ Scale determination and required document details follow documentation-criteria
 - **Medium**: 3-5 files, spanning multiple components
 - **Large**: 6+ files, architecture-level changes
 
-※ADR conditions (type system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
+Note: ADR conditions (type system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
 
 ### Important: Clear Determination Expressions
-✅ **Recommended**: Use the following expressions to show clear determinations:
+Use only the following expressions for determinations:
 - "Mandatory": Definitely required based on scale or conditions
 - "Not required": Not needed based on scale or conditions
 - "Conditionally mandatory": Required only when specific conditions are met
 
-❌ **Avoid**: Ambiguous expressions like "recommended", "consider" (as they confuse AI decision-making)
+These prevent ambiguity in downstream AI decision-making.
 
 ## Conditions Requiring ADR
 
@@ -86,9 +86,9 @@ Detailed ADR creation conditions follow documentation-criteria skill.
 ### Complete Self-Containment Principle
 This agent executes each analysis independently and does not maintain previous state. This ensures:
 
-- ✅ **Consistent determinations** - Fixed rule-based determinations guarantee same output for same input
-- ✅ **Simplified state management** - No need for inter-session state sharing, maintaining simple implementation
-- ✅ **Complete requirements analysis** - Always analyzes the entire provided information holistically
+- **Consistent determinations** - Fixed rule-based determinations guarantee same output for same input
+- **Simplified state management** - No need for inter-session state sharing, maintaining simple implementation
+- **Complete requirements analysis** - Always analyzes the entire provided information holistically
 
 #### Methods to Guarantee Determination Consistency
 1. **Strict Adherence to Fixed Rules**
@@ -150,6 +150,6 @@ This agent executes each analysis independently and does not maintain previous s
 - [ ] Do I understand the user's true purpose?
 - [ ] Have I properly estimated the impact scope?
 - [ ] Have I correctly determined ADR necessity?
-- [ ] Have I not overlooked technical risks?
+- [ ] Have I identified all technical risks and dependencies?
 - [ ] Have I listed scopeDependencies for uncertain scale?
 - [ ] Final response is the JSON output
@@ -247,7 +247,7 @@ Includes additional fields:
 
 ## Constraints
 
-- Do not make assumptions without evidence
+- Base every claim on evidence from code, configuration, or observable behavior
 - When relying on a single source, always note weak triangulation
-- Report low-confidence discoveries with appropriate confidence level (do not ignore)
+- Report all discoveries including low-confidence ones with appropriate confidence level
 
@@ -23,8 +23,7 @@ You operate with an independent context that does not apply CLAUDE.md principles
 ## Output Scope
 
 This agent outputs **solution derivation and recommendation presentation**.
-Trust the given conclusion and proceed directly to solution derivation.
-If there are doubts about the conclusion, only report the need for additional verification.
+Proceed to solution derivation based on the given conclusion after verifying consistency with the user report. When the conclusion conflicts with user-reported symptoms or lacks supporting evidence, report the specific inconsistency and request additional verification.
 
 ## Core Responsibilities