Skip to content

Commit 3d7e47a

Browse files
GATE-004: Enable Namespace Deduplication Check (BLOCKING) (#81)
* Initial plan * Enable GATE-004: Namespace Deduplication Check - Updated CI workflow to enable GATE-004 (changed continue-on-error from true to false) - Updated CI governance gates index to mark GATE-004 as Active - Script check_ata99_registry.py already fully implemented - Database schema already in place (plc_ontology_schema.sql) - Tested locally: script runs successfully and checks for duplicate namespace IDs Co-authored-by: AmedeoPelliccia <164860269+AmedeoPelliccia@users.noreply.github.com> * Fix GATE-004 duplicate detection logic - Fixed namespace ID pattern regex (removed line anchors ^ and $) - Fixed duplicate detection to check scanned entries before registration - Fixed report generation to use tracked conflicts - Fixed output display to show conflicts from report - Tested with duplicate and non-duplicate scenarios - Exit code 0 for pass, 1 for fail with duplicates Co-authored-by: AmedeoPelliccia <164860269+AmedeoPelliccia@users.noreply.github.com> * Address code review feedback for GATE-004 - Removed duplicate conflict tracking (use only checker.conflicts) - Record all conflict pairs for 3+ duplicates (not just first two) - Improved finding message to include occurrence count - Tested with 3-way duplicate scenario Co-authored-by: AmedeoPelliccia <164860269+AmedeoPelliccia@users.noreply.github.com> * Fix conflict pair recording and validation consistency - Record all unique pairs for N duplicates (N*(N-1)/2 pairs) Previously: 3 duplicates → 2 pairs (A-B, B-C) Now: 3 duplicates → 3 pairs (A-B, A-C, B-C) - Changed validation from re.match() to re.search() for consistency Both scanning and validation now use same matching behavior Addresses feedback in PR review comment #2674204672 and #2674204685 Co-authored-by: AmedeoPelliccia <164860269+AmedeoPelliccia@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: AmedeoPelliccia <164860269+AmedeoPelliccia@users.noreply.github.com>
1 parent 57d4113 commit 3d7e47a

File tree

3 files changed

+70
-31
lines changed

3 files changed

+70
-31
lines changed

.github/workflows/governance-gates.yml

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -189,17 +189,13 @@ The CSV should include the following columns:
189189
exit $EXIT_CODE
190190
continue-on-error: false
191191

192-
# GATE-004: Namespace Deduplication Check (BLOCKING - when script exists)
192+
# GATE-004: Namespace Deduplication Check (BLOCKING)
193193
- name: GATE-004 - Namespace Deduplication Check
194194
id: namespace_dedup
195195
run: |
196196
echo "🔍 Checking for duplicate namespace IDs..."
197-
if [ -f "scripts/check_ata99_registry.py" ]; then
198-
python scripts/check_ata99_registry.py --deduplicate
199-
else
200-
echo "⚠️ Script not yet implemented (planned)"
201-
fi
202-
continue-on-error: true
197+
python scripts/check_ata99_registry.py --deduplicate
198+
continue-on-error: false
203199

204200
# GATE-006: Detect Governance Changes (LABELING)
205201
- name: GATE-006 - Detect Governance Changes
@@ -319,7 +315,7 @@ This PR modifies governance-impacting files and requires **CM WG approval** befo
319315
echo "| GATE-002 | ${{ steps.schema_registration.outcome == 'success' && '✅ PASS' || '❌ FAIL' }} | Schema Registration Check |"
320316
echo "| GATE-003 | ${{ steps.trace_integrity.outcome == 'success' && (steps.trace_integrity.outputs.broken_count > 0 && '⚠️ PASS (warnings)' || '✅ PASS') || '❌ FAIL' }} | Trace Link Integrity (${{ steps.trace_integrity.outputs.broken_count || 0 }} links to planned content) |"
321317
echo "| GATE-LINK-001 | ${{ steps.link_integrity.outcome == 'success' && '✅ PASS' || '❌ FAIL' }} | Internal Link Integrity (mode: ${{ steps.link_integrity.outputs.link_mode }}) |"
322-
echo "| GATE-004 | ⏭️ PLANNED | Namespace Deduplication |"
318+
echo "| GATE-004 | ${{ steps.namespace_dedup.outcome == 'success' && '✅ PASS' || '❌ FAIL' }} | Namespace Deduplication |"
323319
echo "| GATE-006 | ${{ steps.governance_changes.outputs.governance_change == 'true' && '⚠️ REVIEW' || '✅ PASS' }} | Governance Change Detection |"
324320
echo "| GATE-007 | ⏭️ PLANNED | Breaking Schema Detection |"
325321
echo "| GATE-008 | ⏭️ PLANNED | Evidence Link Validation |"

00_AMPEL360_SPACET_Q10_GEN_PLUS_BB_GEN_LC01_K05_DATA__ci-governance-gates_IDX_I01-R01_ACTIVE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This index catalogs all **CI/CD governance gates** implemented to enforce nomenc
2828
| **GATE-001** | Nomenclature Validation | `validate_nomenclature.py` | BLOCKING | Active | Validates all files against v3.0 standard |
2929
| **GATE-002** | Schema Registration Check | `scripts/validate_schema_registry.py` | BLOCKING | Active | Verifies schema refs exist in ATA 91 |
3030
| **GATE-003** | Trace Link Integrity | `scripts/validate_trace_links.py --skip-templates` | WARNING (during PORTAL build-out) | Active | Validates trace link targets exist, skips template placeholders and planned structure. Currently allows ~490 links to planned content. |
31-
| **GATE-004** | Namespace Deduplication | `scripts/check_ata99_registry.py` | BLOCKING | Planned | Prevents duplicate IDs across namespaces |
31+
| **GATE-004** | Namespace Deduplication | `scripts/check_ata99_registry.py` | BLOCKING | Active | Prevents duplicate IDs across namespaces |
3232
| **GATE-005** | Identifier Grammar Check | `scripts/validate_identifiers.py` | BLOCKING | Planned | Validates canonical ID format |
3333

3434
### Category B: Pull Request Review Gates
@@ -71,8 +71,8 @@ This index catalogs all **CI/CD governance gates** implemented to enforce nomenc
7171

7272
### By Status
7373

74-
- **Active (implemented):** GATE-001, GATE-002, GATE-003, GATE-006, GATE-009, GATE-010
75-
- **Planned (to be implemented):** GATE-004, GATE-005, GATE-007, GATE-008, GATE-011, GATE-012, GATE-013, GATE-014, GATE-015, GATE-016, GATE-017, GATE-018
74+
- **Active (implemented):** GATE-001, GATE-002, GATE-003, GATE-004, GATE-006, GATE-009, GATE-010
75+
- **Planned (to be implemented):** GATE-005, GATE-007, GATE-008, GATE-011, GATE-012, GATE-013, GATE-014, GATE-015, GATE-016, GATE-017, GATE-018
7676

7777
### By Responsible Team
7878

@@ -90,7 +90,7 @@ This index catalogs all **CI/CD governance gates** implemented to enforce nomenc
9090
| GATE-001 | Nomenclature Standard v3.0 | `.github/workflows/nomenclature-validation.yml` | `validate_nomenclature.py` |
9191
| GATE-002 | Governance Reference Policy §4.2 | `.github/workflows/governance-gates.yml` | ATA 91 schema registry |
9292
| GATE-003 | Governance Reference Policy §5.3 | `.github/workflows/governance-gates.yml` | `scripts/validate_trace_links.py`, `docs/GATE-003-TRACE-LINK-VALIDATION.md` |
93-
| GATE-004 | Identifier Grammar §4.5.2 | Planned: `.github/workflows/governance-gates.yml` | ATA 99 namespace registry |
93+
| GATE-004 | Identifier Grammar §4.5.2 | `.github/workflows/governance-gates.yml` | ATA 99 namespace registry, `scripts/check_ata99_registry.py` |
9494
| GATE-005 | Identifier Grammar §4.1 | Planned: `.github/workflows/governance-gates.yml` | None |
9595
| GATE-006 | Governance Reference Policy §6.3 | `.github/workflows/governance-gates.yml` | CM WG approval list |
9696
| GATE-007 | Governance Reference Policy §4.3 | Planned: `.github/workflows/governance-gates.yml` | ATA 91 schema versioning |
@@ -447,9 +447,9 @@ jobs:
447447

448448
### Implementation Priority
449449

450-
**Phase 1 (Immediate - Complete):** GATE-001, GATE-002, GATE-003, GATE-006, GATE-009, GATE-010 (active)
450+
**Phase 1 (Immediate - Complete):** GATE-001, GATE-002, GATE-003, GATE-004, GATE-006, GATE-009, GATE-010 (active)
451451

452-
**Phase 2 (Q1 2026):** GATE-004, GATE-005 (critical for governance enforcement)
452+
**Phase 2 (Q1 2026):** GATE-005 (critical for governance enforcement)
453453

454454
**Phase 3 (Q2 2026):** GATE-007, GATE-008, GATE-015, GATE-016, GATE-017 (audit and staleness detection)
455455

scripts/check_ata99_registry.py

Lines changed: 60 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,10 @@ class ATA99RegistryChecker:
6161

6262
# Namespace ID patterns for validation
6363
NAMESPACE_PATTERNS = {
64-
'ata99_namespace': r'^NS-ATA99-[A-Z0-9-]+$',
65-
'schema_namespace': r'^NS-SCH-[A-Z0-9-]+$',
66-
'trace_namespace': r'^NS-TR-[A-Z0-9-]+$',
67-
'identifier_namespace': r'^NS-ID-[A-Z0-9-]+$',
64+
'ata99_namespace': r'NS-ATA99-[A-Z0-9-]+',
65+
'schema_namespace': r'NS-SCH-[A-Z0-9-]+',
66+
'trace_namespace': r'NS-TR-[A-Z0-9-]+',
67+
'identifier_namespace': r'NS-ID-[A-Z0-9-]+',
6868
}
6969

7070
def __init__(self, db_path: str = "plc_ontology.db"):
@@ -195,7 +195,7 @@ def validate_namespace_id_format(self, namespace_id: str) -> Tuple[bool, Optiona
195195
(is_valid, error_message)
196196
"""
197197
for ns_type, pattern in self.NAMESPACE_PATTERNS.items():
198-
if re.match(pattern, namespace_id):
198+
if re.search(pattern, namespace_id):
199199
return (True, None)
200200

201201
return (False, f"Namespace ID '{namespace_id}' does not match any known pattern")
@@ -213,12 +213,17 @@ def _compute_file_hash(self, file_path: Path) -> str:
213213

214214
def generate_report(self) -> Dict[str, Any]:
215215
"""Generate deduplication report."""
216-
duplicates = self.db.check_namespace_duplicates()
217-
218216
report = {
219-
'total_duplicates': len(duplicates),
217+
'total_duplicates': len(self.conflicts),
220218
'conflicts': self.conflicts,
221-
'duplicates_detail': duplicates
219+
'duplicates_detail': [
220+
{
221+
'namespace_id': c['namespace_id'],
222+
'count': c['count'],
223+
'paths': ', '.join(c['paths'])
224+
}
225+
for c in self.conflicts
226+
]
222227
}
223228

224229
return report
@@ -248,41 +253,66 @@ def run_gate_004(db_path: str = "plc_ontology.db", directory: Path = Path('.'))
248253
(passed, report)
249254
"""
250255
import time
256+
from collections import defaultdict
251257
start_time = time.time()
252258

253259
checker = ATA99RegistryChecker(db_path)
254260

255261
# Scan and register namespaces
256262
entries = checker.scan_repository(directory)
263+
264+
# Check for duplicates in scanned entries BEFORE registering
265+
namespace_to_paths = defaultdict(list)
266+
for entry in entries:
267+
namespace_to_paths[entry.namespace_id].append(entry.artifact_path)
268+
269+
for namespace_id, paths in namespace_to_paths.items():
270+
if len(paths) > 1:
271+
# Record all unique conflict pairs in the database
272+
# For N duplicates, record all N*(N-1)/2 unique pairs
273+
for i in range(len(paths)):
274+
for j in range(i + 1, len(paths)):
275+
checker.db.record_namespace_conflict(
276+
namespace_id=namespace_id,
277+
artifact_path_1=paths[i],
278+
artifact_path_2=paths[j],
279+
conflict_type='DUPLICATE_ID'
280+
)
281+
282+
# Add to checker's conflict list (used for reporting)
283+
checker.conflicts.append({
284+
'namespace_id': namespace_id,
285+
'count': len(paths),
286+
'paths': paths
287+
})
288+
289+
# Register namespaces (INSERT OR REPLACE will update existing ones)
257290
if entries:
258291
checker.register_namespaces(entries)
259292

260-
# Check for duplicates
261-
conflicts = checker.check_duplicates()
262-
263293
# Generate report
264294
report = checker.generate_report()
265295

266296
execution_time_ms = int((time.time() - start_time) * 1000)
267297

268298
# Record gate run in database
269-
passed = len(conflicts) == 0
299+
passed = len(checker.conflicts) == 0
270300
run_id = checker.db.record_gate_run(
271301
gate_code='GATE-004',
272302
passed=passed,
273-
error_count=len(conflicts),
303+
error_count=len(checker.conflicts),
274304
warning_count=0,
275305
execution_time_ms=execution_time_ms,
276306
metadata=report
277307
)
278308

279309
# Record findings
280-
for conflict in conflicts:
310+
for conflict in checker.conflicts:
281311
checker.db.record_gate_finding(
282312
run_id=run_id,
283313
gate_code='GATE-004',
284314
severity='ERROR',
285-
message=f"Duplicate namespace ID: {conflict['namespace_id']}",
315+
message=f"Duplicate namespace ID: {conflict['namespace_id']} ({conflict['count']} occurrences)",
286316
finding_code='NAMESPACE_DUPLICATE',
287317
details=conflict
288318
)
@@ -388,7 +418,20 @@ def main():
388418
if args.json:
389419
print(json.dumps(report, indent=2))
390420
else:
391-
checker.print_conflicts()
421+
# Print conflicts from report
422+
conflicts = report.get('conflicts', [])
423+
if not conflicts:
424+
print("\n✅ No namespace conflicts detected")
425+
else:
426+
print(f"\n❌ Found {len(conflicts)} namespace conflict(s):\n")
427+
428+
for conflict in conflicts:
429+
print(f"Namespace ID: {conflict['namespace_id']}")
430+
print(f" Occurrences: {conflict['count']}")
431+
print(f" Conflicting files:")
432+
for path in conflict['paths']:
433+
print(f" - {path}")
434+
print()
392435

393436
print(f"\n{'═'*70}")
394437
print(f"GATE-004: Namespace Deduplication Check")

0 commit comments

Comments
 (0)