Skip to content

Latest commit

 

History

History
156 lines (122 loc) · 6.9 KB

File metadata and controls

156 lines (122 loc) · 6.9 KB

Changelog - ZTP Deployment Timeline Tracker

Version 2.1 (November 20, 2025)

🎯 Major Enhancements

1. ArgoCD Application Starting Point

  • Feature: Captures ArgoCD Application creation timestamp (true GitOps deployment trigger)
  • Benefit: Provides earliest possible deployment start (3-10 seconds before ManagedCluster)
  • Detection: Searches for Applications with siteconfig path in openshift-gitops/argocd namespaces
  • API: Uses applications.argoproj.io for full API resource access
  • Universal: Works for both SiteConfig v1 and v2 deployments
  • Parent App Support: Detects parent apps like "clusters" that manage multiple cluster deployments
  • Use Case: Accurate KPI measurement from true GitOps trigger

2. ClusterInstance Tracking

  • Feature: Captures ClusterInstance creation and condition events
  • Benefit: Tracks SiteConfig v2 operator reconciliation events
  • Requirement: SiteConfig v2 operator (not available in SiteConfig v1 deployments)
  • Fallback: Gracefully ignored if not present, doesn't fail
  • Future-Proof: Kept for potential verbose events and enrichment
  • Use Case: SiteConfig v2-based ZTP deployments

3. TALM CGU Completion Tracking

  • Feature: Captures TALM ClusterGroupUpgrade workflow and completion
  • Benefit: Provides accurate policy completion timestamps
  • Events Captured:
    • CGU Condition: ClustersSelected
    • CGU Condition: Validated
    • CGU Condition: Progressing
    • CGU Condition: Succeeded
    • Managed Policies status
  • Fallback: Uses individual policy compliance events if CGU not present
  • Use Case: Accurate KPI measurement for policy application phase

📊 Test Results (spree-02 Cluster)

Metric Before (v2.0) After (v2.1) Improvement
Events Captured 61 82 +34%
Deployment Start Accuracy ManagedCluster ArgoCD App (3s earlier) Improved precision
Policy Completion Visibility ❌ None ✅ TALM CGU workflow New capability
Milestones Tracked 7 11 +4 milestones
SiteConfig v2 Support ❌ No ✅ Yes (detected) New capability
ArgoCD GitOps Tracking ❌ No ✅ Yes New capability
ClusterInstance Detection N/A ✅ Present (+6 events) New capability

🔍 Key Discovery from Testing

Testing on spree-02 revealed:

  • ArgoCD Application Created: 16:09:04Z (true deployment start)
  • ClusterInstance Created: 16:09:06Z (+2s - SiteConfig v2 operator reconciliation)
  • ManagedCluster Created: 16:09:07Z (+3s)
  • Installation: 25 minutes ✅ (normal)
  • TALM CGU Completed: 1h 10m ✅ (normal - cluster ready for workloads)
  • ACM Cluster Available: 18h 29m ⚠️ (abnormal)
  • Root Cause Identified: 17+ hour delay AFTER policy completion, not during installation or policies

This demonstrates the value of v2.1 in:

  1. Precise KPI measurement: ArgoCD Application provides true deployment start
  2. Workload-ready vs ACM availability: TALM CGU completion clearly marks when cluster is ready for workloads
  3. Root cause isolation: Easily identify delays between deployment phases
  4. Real-time monitoring: Workload readiness status shows how long cluster has been ready (calculated from NOW)

📝 Changes

Scripts Modified

  • get-ztp-deployment-timeline.sh - Added ArgoCD Application, ClusterInstance, and TALM CGU queries
    • ClusterInstance Detection Fix: Searches by namespace instead of exact cluster name (e.g., finds site-plan-spree-02 in spree-02 namespace)
  • summarize-ztp-deployment.sh - Enhanced output with new sections:
    • WORKLOAD READINESS STATUS: Shows when cluster became ready for workloads and time elapsed since then
    • DEPLOYMENT SUMMARY: Shows total deployment time from Agent Bound to Ready for Workloads
    • FEATURE STATUS: Displays ArgoCD, ClusterInstance, and TALM CGU detection status
    • Removed misleading "ACM: Cluster Joined and Available" milestone from KEY MILESTONES

Output Improvements

  • WORKLOAD READINESS STATUS: New section showing cluster readiness timestamp and elapsed time since NOW
  • DEPLOYMENT SUMMARY: Concise summary showing total time from Agent Bound to Ready for Workloads
  • Removed misleading milestone: "ACM: Cluster Joined and Available" no longer in KEY MILESTONES (moved to detailed breakdown)
  • Feature detection reporting: Clear status for ArgoCD, ClusterInstance, and TALM CGU availability

Documentation Updated

  • README.md - Updated sample output with new sections and features
  • ✅ Added milestone breakdown (0-10 phases)
  • ✅ Added "Why TALM CGU Completion Matters" section
  • ✅ Updated changelog with version 2.1 details and output improvements

🎓 Credits

Enhancements based on feedback from: Ian Miller (@imiller)

Implemented and tested by: Carlos Cardenosa (@ccardenosa)

Test Environment: spree-02 (hub-kni-qe-71)

Test Date: November 20, 2025

🔄 Backward Compatibility

Version 2.1 maintains full backward compatibility:

  • ✅ Works with or without ArgoCD Application (falls back to ClusterInstance or ManagedCluster)
  • ✅ Works with SiteConfig v1 deployments (no ClusterInstance resource)
  • ✅ Works with SiteConfig v2 deployments (ClusterInstance resource present)
  • ✅ Works without TALM CGU (falls back to policy events)
  • ✅ All v2.0 features still functional
  • ✅ No breaking changes to command-line interface
  • ✅ JSON output structure extended (not changed)

📦 Migration Notes

From v2.0 to v2.1: No action required

  • Scripts can be replaced directly
  • No configuration changes needed
  • Existing scripts and workflows continue to work
  • New features activate automatically when resources available

🎯 Recommended Usage

For Accurate KPI Measurement:

  1. Use TALM CGU Succeeded timestamp as policy completion metric (when available)
  2. Use ClusterInstance creation as deployment start (when available)
  3. Fallback to traditional milestones for legacy deployments

Success Criteria:

  • Traditional: Cluster Available + ztp-done label
  • Recommended (v2.1): TALM CGU Succeeded (more accurate)

🔗 Related Documentation

  • Test Results: ~/troubleshooting/clusters/spree-02/ztp-timeline-comparison-original-vs-enhanced.md
  • Quick Summary: ~/troubleshooting/clusters/spree-02/QUICK_SUMMARY.md
  • Full Analysis: ~/troubleshooting/clusters/spree-02/ztp-timeline-test-results-2025-11-20.md

Version 2.0 (November 2025)

Initial Features

  • Complete ZTP deployment timeline tracking
  • Assisted Service API integration
  • ManagedCluster lifecycle tracking
  • Policy compliance monitoring
  • ManifestWork tracking
  • BareMetalHost provisioning events
  • Human-readable summary output
  • JSON output for automation
  • Cross-platform support (macOS/Linux)
  • Remote execution via SSH

Current Version: 2.1
Release Date: November 20, 2025
Compatibility: ACM 2.6+, OpenShift 4.12+
License: Apache License 2.0