Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions experiments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ These experimental patterns extend the core AI development patterns with advance
| **[Release Synthesis](#release-synthesis)** | Beginner | Operations | Automatically generate structured release notes by analyzing git commit history | Pipeline Synthesis |
| **[Incident Automation](#incident-automation)** | Advanced | Operations | Generate actionable incident response playbooks from historical incident data | Baseline Management |
| **[Suite Health](#suite-health)** | Intermediate | Operations | Analyze build history to identify and remediate flaky tests automatically | Testing Orchestration |
| **[Test Promotion](#test-promotion)** | Intermediate | Development | Separate AI-generated tests from immutable golden tests to prevent AI from weakening test assertions | Testing Orchestration, Spec-Driven Development |
| **[Upgrade Advisor](#upgrade-advisor)** | Intermediate | Operations | Intelligently manage dependency upgrades with compatibility analysis and risk assessment | Debt Forecasting |
| **[Handoff Automation](#handoff-automation)** | Intermediate | Operations | Generate comprehensive handoff briefs that summarize system state and active issues | Incident Automation |
| **[Chaos Engineering](#chaos-engineering)** | Advanced | Operations | Generate targeted chaos experiments based on system architecture and dependencies | Baseline Management |
Expand Down Expand Up @@ -1325,6 +1326,140 @@ Accepting unreliable tests as normal instead of systematically identifying and f

---

### Test Promotion

**Maturity**: Intermediate
**Description**: Separate AI-generated tests from immutable golden tests to prevent AI from weakening test assertions, with human-approved promotion ensuring only validated tests become behavioral contracts.

**Related Patterns**: [Testing Orchestration](#testing-orchestration), [Spec-Driven Development](../README.md#spec-driven-development), [Suite Health](#suite-health)

**Core Problem**

When AI generates both code AND tests, it can make tests pass by weakening them—the "self-grading student" problem. This applies to all AI code generation: new features, bug fixes, refactoring, or any implementation task.

**Test Separation Architecture**

```
tests/
├── golden/ # Immutable (444 permissions) - AI blocked
│ ├── auth/
│ │ └── test_jwt_validation.py
│ └── api/
│ └── test_payment.py
└── generated/ # Mutable - AI can freely generate/modify
├── test_edge_cases.py
└── test_new_feature.py
```

**Defense-in-Depth Enforcement**

The pattern uses multiple enforcement layers because **file permissions alone are insufficient** - AI with bash access could bypass them with `chmod`.

```bash
# Layer 1: File permissions (prevents accidental edits)
chmod 444 tests/golden/**/*.py
# ⚠️ NOT SUFFICIENT: AI can run "chmod 644" via Bash to bypass

# Layer 2: AI hooks (blocks Edit/Write tools)
# .ai/hooks/protect-golden.sh
[[ "$TOOL_INPUT_FILE_PATH" =~ ^tests/golden/ ]] && exit 2 # BLOCK
# ⚠️ NOT SUFFICIENT: AI can still modify via Bash commands

# Layer 3: CI/CD enforcement (detects ANY git diff)
git diff --name-only origin/main...HEAD | grep '^tests/golden/' && {
echo "❌ BLOCKED: Golden tests cannot be modified"
exit 1
}
# ✅ RELIABLE: Catches all modifications regardless of method

# Layer 4: CODEOWNERS (requires human approval)
# .github/CODEOWNERS
tests/golden/** @tech-leads @qa-leads
# ✅ RELIABLE: Human gate prevents merge even if AI commits changes
```

**Threat Model:**
- **Accidental Edit**: Blocked by file permissions (444)
- **AI Edit/Write Tool**: Blocked by AI hooks
- **AI Bash Bypass**: Detected by CI/CD git diff check
- **Committed Changes**: Blocked by CODEOWNERS requiring human approval

**Primary Enforcement**: CI/CD + CODEOWNERS, not file permissions.

**Promotion Workflow**

```bash
# AI generates test freely in tests/generated/
ai "Write payment idempotency test in tests/generated/test_payment.py"

# Human reviews and promotes
./scripts/promote-test.sh tests/generated/test_payment.py
# → Runs pytest validation
# → Interactive quality checklist
# → Copies to tests/golden/ with 444 permissions
# → Creates promotion PR requiring 2+ approvals
```

**Example: Golden Test Protection**

```python
# AI generates test freely
# tests/generated/test_new_feature.py
def test_payment_idempotency():
"""Payment processing should prevent duplicate charges."""
process_payment(id="123", amount=100)
with pytest.raises(DuplicateTransactionError):
process_payment(id="123", amount=100)

# Human reviews → promotes to golden
# tests/golden/test_payment.py (444 perms, AI blocked)
```

**Complete Implementation**

See [examples/test-promotion/](examples/test-promotion/) for:
- Complete promotion workflow scripts
- CI/CD enforcement configuration
- AI protection hooks
- Example application demonstrating the pattern

**Anti-pattern: Mutable Baselines**

Allowing AI to modify existing tests to make its code pass, removing critical assertions.

```python
# BEFORE (correct test):
def test_payment_idempotency():
process_payment(id="123", amount=100)
with pytest.raises(DuplicateTransactionError):
process_payment(id="123", amount=100)

# AFTER AI weakens test to pass buggy code:
def test_payment_idempotency():
process_payment(id="123", amount=100)
process_payment(id="123", amount=100) # No error check!
# BUG: Allows double-charging customers in production
```

Without immutable golden tests, AI can weaken assertions to make failing tests pass, eliminating regression protection.

**Anti-pattern: Permission-Only Protection**

Relying solely on file permissions (444) without CI/CD enforcement.

```bash
# INSUFFICIENT: AI can bypass via Bash
chmod 444 tests/golden/** # AI runs: chmod 644 && edit && chmod 444

# REQUIRED: CI/CD + CODEOWNERS as primary enforcement
git diff tests/golden/ → CI blocks merge
tests/golden/** → CODEOWNERS requires human approval
```

File permissions provide defense-in-depth but are not sufficient alone. CI/CD git diff detection and CODEOWNERS are the primary enforcement mechanisms.

---

### Upgrade Advisor

**Maturity**: Intermediate
Expand Down
31 changes: 31 additions & 0 deletions experiments/examples/test-promotion/.ai/hooks/protect-golden.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
# AI Protection Hook for Golden Tests
# Blocks AI tools from modifying immutable golden tests

# This hook executes before Edit/Write tool use
# Exit code 0 = ALLOW
# Exit code 2 = BLOCK

FILE="$TOOL_INPUT_FILE_PATH"
TOOL="$TOOL_NAME"

# Block any Edit or Write operations on tests/golden/**
if [[ "$FILE" =~ ^tests/golden/ ]] && [[ "$TOOL" =~ (Edit|Write) ]]; then
echo "❌ BLOCKED: Golden tests are immutable"
echo ""
echo " File: $FILE"
echo " Tool: $TOOL"
echo ""
echo "Golden tests are read-only behavioral contracts."
echo "AI cannot modify these files to prevent weakening assertions."
echo ""
echo "Instead:"
echo " 1. Create test in tests/generated/$( basename "$FILE")"
echo " 2. Run and validate the test"
echo " 3. Ask human to promote: ./scripts/promote-test.sh tests/generated/$(basename "$FILE")"
echo ""
exit 2 # BLOCK
fi

# Allow all other operations
exit 0 # ALLOW
17 changes: 17 additions & 0 deletions experiments/examples/test-promotion/.github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Golden Test Protection via CODEOWNERS
#
# This is the PRIMARY enforcement mechanism for immutable golden tests.
# File permissions (444) and AI hooks provide defense-in-depth but can be
# bypassed by AI using bash commands like chmod.
#
# CODEOWNERS ensures that ANY modification to tests/golden/** requires
# explicit human approval, regardless of how the change was made.

# Golden tests require approval from tech leads and QA leads
tests/golden/** @tech-leads @qa-leads

# Promotion workflow changes also require approval
scripts/promote-test.sh @tech-leads

# Note: Replace @tech-leads and @qa-leads with actual GitHub team names
# or individual usernames (e.g., @alice @bob)
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: Golden Test Protection

on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches:
- main
- master

jobs:
protect-golden-tests:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for diff

- name: Detect golden test modifications
run: |
# Get list of modified files in this PR/push
if [ "${{ github.event_name }}" = "pull_request" ]; then
BASE="${{ github.event.pull_request.base.sha }}"
HEAD="${{ github.event.pull_request.head.sha }}"
else
BASE="${{ github.event.before }}"
HEAD="${{ github.sha }}"
fi

echo "Checking for modifications to tests/golden/..."
MODIFIED_GOLDEN=$(git diff --name-only "$BASE...$HEAD" | grep '^tests/golden/' || true)

if [ -n "$MODIFIED_GOLDEN" ]; then
echo "❌ BLOCKED: Golden tests cannot be modified directly"
echo ""
echo "Modified golden tests:"
echo "$MODIFIED_GOLDEN"
echo ""
echo "Golden tests are immutable behavioral contracts."
echo "To update tests, use the promotion workflow:"
echo " 1. Create/modify test in tests/generated/"
echo " 2. Run: ./scripts/promote-test.sh tests/generated/<test-file>"
echo " 3. Create PR with 'test-promotion' label"
echo " 4. Require 2+ approvals"
echo ""
echo "For test removal or modification, consult team lead."
exit 1
fi

echo "✅ No golden test modifications detected"

- name: Validate promotion PRs
if: github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'test-promotion')
run: |
echo "🔍 Validating test promotion PR..."

# Check that ONLY golden tests were added (not modified)
ADDED=$(git diff --name-status "${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }}" | grep '^A' | grep 'tests/golden/' || true)
MODIFIED=$(git diff --name-status "${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }}" | grep '^M' | grep 'tests/golden/' || true)

if [ -n "$MODIFIED" ]; then
echo "❌ Promotion PRs should only ADD tests, not MODIFY"
echo "Modified files:"
echo "$MODIFIED"
exit 1
fi

if [ -z "$ADDED" ]; then
echo "⚠️ No tests added in promotion PR"
else
echo "✅ Test promotion validated:"
echo "$ADDED"
fi

- name: Check golden test permissions
run: |
echo "🔍 Verifying golden test permissions..."

# Check that all golden tests have 444 permissions
INCORRECT_PERMS=0
while IFS= read -r -d '' file; do
PERMS=$(stat -f "%OLp" "$file" 2>/dev/null || stat -c "%a" "$file" 2>/dev/null)
if [ "$PERMS" != "444" ]; then
echo "❌ Incorrect permissions: $file ($PERMS, should be 444)"
((INCORRECT_PERMS++))
fi
done < <(find tests/golden -type f -name "*.py" -print0)

if [ "$INCORRECT_PERMS" -gt 0 ]; then
echo ""
echo "Run: ./scripts/enforce-permissions.sh"
exit 1
fi

echo "✅ All golden tests have correct permissions (444)"
Loading
Loading