Refactor/ipa telco kpis prow migration rds compare test by ccardenosa · Pull Request #462 · openshift-kni/eco-ci-cd

ccardenosa · 2026-05-06T19:01:37Z

No description provided.

Implement role-based container image mirroring system for internal registry management, supporting both mirror and removal operations with authentication. ## Problem Telco-KPIs testing requires mirroring container images to internal registries for disconnected environments and test image management. Previous approach used inline playbook tasks without reusability. ## Solution Created dedicated `container_image_mirror` Ansible role with playbooks for both mirroring and removal operations: **Role: playbooks/roles/container_image_mirror/** - Supports 'mirror' and 'remove' operations via parameter - Uses skopeo for image operations - Handles authentication with pull secrets - Continues operation even if some images fail - Comprehensive success/failure reporting with summary **Playbooks:** - `playbooks/mirror-images.yml` - Mirror images to internal registry - `playbooks/remove-images.yml` - Remove images from registry storage ## Features **Authentication:** - Pull secret support for private registries (via pull_secret_string or pull_secret_path) - System default auth when no pull secret provided - Configurable auth file location (/tmp for bastion compatibility) - use_pull_secret flag to control authentication method **Registry Configuration:** - Configurable registry host/port/namespace - TLS verification control - Source and destination registry support **Operations:** - Idempotent with existence checks - Detailed mirror/removal summary - Error handling continues operation on failures ## Usage **Mirror images:** ```bash ansible-playbook playbooks/mirror-images.yml \ -e images='[{"source": "quay.io/image:tag", "dest": "registry.local/namespace/image:tag"}]' \ -e registry_host=registry.local \ -e pull_secret_string='{"auths": {...}}' ``` **Remove images:** ```bash ansible-playbook playbooks/remove-images.yml \ -e images='[{"dest": "registry.local/namespace/image:tag"}]' \ -e registry_host=registry.local ``` ## Implementation Details **Role Structure:** - `defaults/main.yaml` - Default variables - `tasks/main.yaml` - Entry point with operation dispatch - `tasks/mirror.yaml` - Mirror images using skopeo - `tasks/remove.yaml` - Remove images from registry storage - `meta/main.yaml` - Role metadata - `README.md` - Comprehensive documentation **Key Variables:** - `container_image_mirror_operation`: "mirror" or "remove" - `container_image_mirror_images`: List of image objects - `container_image_mirror_registry_host`: Target registry hostname - `container_image_mirror_pull_secret_string`: JSON pull secret - `container_image_mirror_use_pull_secret`: Enable/disable authentication ## Benefits - Reusable role for both mirror and removal operations - Cleaner separation of concerns - Easier to test and maintain - Follows eco-ci-cd role patterns (like ocp_operator_mirror) - Well-documented with examples - Jenkins job compatible (uses same variable names) ## Jenkins Integration Used by `telco-kpis-mirror-ran-test-images` Jenkins job for mirroring RAN test images to internal registries. Related: Telco-KPIs test infrastructure Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>

Add parse-lockdown.yml playbook that extracts deployment parameters from lockdown JSON files, enabling decoupled parameter management for reproducible deployments. ## Problem Telco-KPIs testing requires exact software versions (OCP releases, operator channels, catalogs) for reproducible deployments. Parsing lockdown JSON inline within deployment playbooks creates tight coupling and makes parameter reuse across multiple jobs difficult. ## Solution Implement standalone parser playbook that runs before deployment jobs: 1. Downloads and parses lockdown JSON from URI 2. Auto-detects lockdown type (hub vs spoke) from JSON structure 3. Extracts deployment parameters 4. Outputs in shell env and JSON formats for downstream consumption ## Changes **New playbook: playbooks/telco-kpis/parse-lockdown.yml** - Auto-detection logic: checks for 'hub' vs 'deployment' key in JSON - Hub parsing: extracts OCP_RELEASE_IMAGE, ACM_CHANNEL, MCE_CHANNEL, catalogs - Spoke parsing: extracts OCP_PULL_SPEC, ZTP_PULL_SPEC, operator configurations - SSL certificate bypass for internal GitLab instances (validate_certs: false) - Dynamic artifact naming using lockdown filename from URI - Outputs three artifacts per run: - `{lockdown-name}.json`: Original lockdown file - `{lockdown-name}-params.env`: Shell environment variables - `{lockdown-name}-params.json`: Structured JSON parameters **New role: playbooks/telco-kpis/roles/lockdown_hub_config/** - tasks/main.yml: Download, validate, and parse hub lockdown JSON - defaults/main.yml: Default configuration values - README.md: Comprehensive role documentation - Used by both parse-lockdown.yml and deploy-ocp-operators.yml ## Usage Workflow **Step 1: Parse lockdown file** ```bash ansible-playbook playbooks/telco-kpis/parse-lockdown.yml \ -e lockdown_uri=https://gitlab.cee.redhat.com/.../lockdown-hub-x86_64.json ``` **Step 2: Use extracted parameters** ```bash # Source env file source lockdown-hub-x86_64-params.env # Use in deployment ansible-playbook playbooks/deploy-ocp-sno.yml \ -e release="${OCP_RELEASE_IMAGE}" ``` ## Benefits **Decoupling:** - Parsing separated from deployment logic - Parameters extracted once, reused across multiple jobs - Easier debugging with explicit parameter artifacts **Flexibility:** - Supports multiple lockdown formats (hub, spoke, baseline) - Self-documenting artifacts with actual lockdown names - Both shell and JSON output formats **Prow-ready:** - Clean separation aligns with Prow step registry architecture - Parser step can run independently, output shared via SHARED_DIR ## Key Features **Auto-detection:** ```yaml lockdown_type: "{{ 'hub' if ('hub' in lockdown_data) else 'spoke' }}" ``` **Dynamic artifact naming:** ```yaml lockdown_filename: "{{ lockdown_uri | regex_replace('.*/', '') | regex_replace('.json$', '') }}" # Result: lockdown-hub-x86_64-params.env (not generic lockdown-params.env) ``` **Hub channel transformations:** ```yaml hub_acm_channel: "release-{{ lockdown_data.hub.acm.version_override }}" hub_mce_channel: "{{ lockdown_data.hub.acm.mce_override | regex_replace('^v', 'stable-') | regex_replace('\\.\\d+$', '') }}" ``` ## Example Artifacts **lockdown-hub-x86_64-params.env:** ```bash LOCKDOWN_TYPE=hub OCP_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.20.4-x86_64 OCP_VERSION=4.20 ACM_CHANNEL=release-2.13 MCE_CHANNEL=stable-2.8 TALM_CATALOG=quay.io/.../talm-index:v4.20 GITOPS_CATALOG=quay.io/.../gitops-index:v1.15 ``` **lockdown-hub-x86_64-params.json:** ```json { "LOCKDOWN_TYPE": "hub", "OCP_RELEASE_IMAGE": "quay.io/openshift-release-dev/ocp-release:4.20.4-x86_64", "OCP_VERSION": "4.20", "ACM_CHANNEL": "release-2.13", "MCE_CHANNEL": "stable-2.8", "TALM_CATALOG": "quay.io/.../talm-index:v4.20", "GITOPS_CATALOG": "quay.io/.../gitops-index:v1.15" } ``` ## Verification Tested with both lockdown types: - Hub lockdown: Successfully extracted OCP 4.20.4 pull spec and operator channels - Spoke lockdown: Successfully extracted spoke deployment parameters Artifacts correctly named with lockdown filename and contain expected parameters. Related: Telco-KPIs reproducible deployment system Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>

Add comprehensive troubleshooting guides for common Telco-KPIs deployment and testing issues. ## Added Documentation **playbooks/telco-kpis/docs/troubleshooting/** 1. **prometheus-pod-stuck-reboot-test-blocker.md** - Issue: Prometheus pod stuck in Init:0/1 causing reboot tests to skip - Root cause: Corrupted alertmanager-main-generated ConfigMap (OCPBUGS-65953, OCPBUGS-70352) - Impact: CNF-gotests BeforeEach health check fails - Workaround: Fix ConfigMap data and restart pod - Prevention: Automated fix option for post-deployment playbooks 2. **k8s-exec-ipv6-fallback-issue.md** - Issue: kubernetes.core.k8s_exec fails with "No route to host" in dual-stack environments - Root cause: Python websocket-client library doesn't fall back to IPv4 - Impact: Blocks pod exec operations (BIOS/microcode collection, hardware info) - Solution: Use `oc exec` via ansible.builtin.shell instead - Verification: Tested on spree-02 cluster (2026-04-30) ## Benefits - Faster troubleshooting with documented solutions - Reduces repeated investigation of known issues - Provides context (bug IDs, verification dates) for future reference - Includes both workarounds and permanent solutions Related: Telco-KPIs test infrastructure reliability

Implement Gitea deployment and report publishing infrastructure for hosting Telco-KPIs test reports with vault integration and retention policies. ## Problem Telco-KPIs test reports need centralized hosting accessible to test engineers and stakeholders. Reports generated on bastion hosts need automated publishing to a Git-based repository system with retention management. ## Solution Created `gitea` Ansible role for deploying Gitea server and publishing test reports as Markdown files with compressed artifacts. **Role: playbooks/telco-kpis/roles/gitea/** ## Features **Deployment:** - Podman-based Gitea server deployment on bastion - SQLite database with automatic migration handling - Firewall configuration (port 3000) - Accessibility checking instead of container existence - Admin user creation with API token management **Report Publishing:** - Creates organization and repositories automatically - Publishes Markdown reports via Gitea API - Uploads compressed tarball as release artifact - Updates repository README with latest report links - Retention policy: keeps last 15 reports, removes older ones **Vault Integration:** - Gitea credentials stored in Ansible vault - Secure API token management - Credential validation before operations ## Implementation Details **Role Structure:** - `tasks/main.yml` - Entry point with operation dispatch - `tasks/deploy.yml` - Gitea server deployment - `tasks/initialize.yml` - Initial configuration and admin setup - `tasks/validate-credentials.yml` - Vault credential validation - `tasks/create-repository.yml` - Repository creation - `tasks/publish-report.yml` - Report publishing workflow - `defaults/main.yml` - Default variables - `templates/README.md.j2` - Repository README template **Task Operations:** - `gitea_operation: deploy` - Deploy and initialize Gitea server - `gitea_operation: publish` - Publish test report - `gitea_operation: validate` - Validate vault credentials ## Usage **Deploy Gitea:** ```yaml - name: Deploy Gitea server ansible.builtin.include_role: name: gitea vars: gitea_operation: deploy gitea_vault_org: telco-kpis ``` **Publish report:** ```yaml - name: Publish test report ansible.builtin.include_role: name: gitea vars: gitea_operation: publish gitea_vault_org: telco-kpis gitea_vault_repo: hlxcl7-reports gitea_report_file: /path/to/report.md gitea_artifact_file: /path/to/artifacts.tar.gz ``` ## Key Features **Firewall Management:** - Detects firewalld vs. iptables - Adds port 3000 rule if not present - Handles both firewall backends **Database Migration:** - Waits for database initialization on first run - Handles migration errors gracefully - Retries admin user creation after migration **Repository Retention:** - Keeps last 15 reports per repository - Automatically deletes older reports - Prevents unbounded repository growth **Error Handling:** - Comprehensive API error checking - Retries for transient failures - Detailed error messages ## Benefits - Centralized test report hosting - Automated report publishing workflow - Secure credential management via vault - Retention policy prevents storage bloat - Accessible web UI for stakeholders - Git-based versioning of reports ## Integration Used by `playbooks/telco-kpis/generate-report.yml` to publish aggregated test reports from all Telco-KPIs tests (node-info, BIOS validation, performance tests, deployment timeline). Related: Telco-KPIs test infrastructure, report generation system Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>

Creates a scalable test execution framework supporting multiple test types across spoke clusters, replacing dedicated per-test playbooks with a task-based architecture for better maintainability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>

Add OSLAT (OS Latency) performance test for measuring operating system-induced latency on Telco-KPIs spoke clusters. Implements run-oslat-test.yml task with podman-based execution, artifact collection, and JUnit XML report generation for CI integration. Related: Telco-KPIs performance testing

Add Cyclictest performance test for measuring real-time kernel latency on Telco-KPIs spoke clusters. Implements run-cyclictest-test.yml task with podman-based execution, artifact collection to /artifacts directory, and JUnit XML report generation. Related: Telco-KPIs performance testing

Add PTP (Precision Time Protocol) test for validating time synchronization on Telco-KPIs spoke clusters. Implements run-ptp-test.yml task with podman-based execution, artifact collection, and JUnit XML report generation for CI integration. Related: Telco-KPIs performance testing

Add CPU utilization and reboot tests using cnf-gotests framework for Telco-KPIs spoke cluster validation. **CPU Utilization Test:** - Supports baseline mode for establishing performance baselines - Measures CPU usage patterns under load - Generates JUnit XML reports **Reboot Test:** - Validates cluster stability across node reboots - Uses cnf-gotests framework - Verifies cluster health after reboot cycles Both tests use podman-based execution with artifact collection. Related: Telco-KPIs performance testing

Add RFC2544 network performance test using ran-integration framework for Telco-KPIs spoke cluster throughput and latency validation. ## Features **Spirent Configuration:** - Loads configuration from ran-integration inventory files - Supports CHASSIS, STCWEB, PORT1, PORT2 parameters - Cluster FQDN read from vault - Fallback to hardcoded defaults **DPDK TestPMD Pod Deployment:** - Deploys dpdk-testpmd pod for packet processing - Uses kubernetes.core modules for pod management - Architecture-aware hugepages configuration (ARM64 vs x86_64) - Pod deletion verification after test completion **Test Execution:** - Clones ran-integration repository (configurable branch) - Runs RFC2544 test scripts via podman - Collects JUnit XML artifacts ## Implementation Details - Spirent config parsing using shell commands (grep/cut) - ARM detection: `is_arm` variable for architecture-specific settings - Proper Jinja2 boolean handling (False → false in templates) - Error handling with default() filters for missing variables Related: Telco-KPIs performance testing, ran-integration

Add comprehensive BIOS settings validation and hardware information collection for Telco-KPIs spoke clusters. ## Playbooks **playbooks/telco-kpis/collect-node-info.yml:** - Collects hardware metadata (CPU, BIOS, firmware, NIC details) - Queries Redfish BMC for BIOS settings - Outputs JSON artifact: node-info-{spoke}.json **playbooks/telco-kpis/run-bios-validation.yml:** - Validates BIOS settings against expected profile - Supports x86_64 (TelcoOptimizedProfile) and ARM64 architectures - Generates JUnit XML test results ## Implementation Details **Tasks:** - `collect-node-info.yml` - Hardware info collection task - `run-bios-validation-test.yml` - BIOS validation task **Key Features:** - BMC credentials read from vault file - Architecture normalization (amd64 → x86_64) - Uses `oc exec` instead of kubernetes.core.k8s_exec (IPv6 fallback workaround) - BIOS profile parsing from Redfish API - Microcode version collection via dmidecode **Expected BIOS Settings (x86_64):** - WorkloadProfile: TelcoOptimizedProfile - ProcPwrPerf: SysDbpmTelco - ProcC States: Enabled/Disabled per setting - SecureBoot: Enabled with DeployedMode Related: Telco-KPIs hardware validation

Add RDS (Reference Design Spec) comparison test for validating spoke cluster configuration against reference deployments. ## Features **Playbook:** playbooks/telco-kpis/run-rds-compare.yml **Task:** tasks/run-rds-compare-test.yml **Implementation:** - Ensures test-runner container image exists before execution - Uses kubectl-cluster_compare tool (added to test-runner Containerfile) - Mounts spoke-specific artifact directory for report generation - Artifacts mounted to both /workspace/reports/podman-runs and /workspace/reports/rds-compare - Generates JUnit XML test results **Container Image Management:** - Checks for kubectl-cluster_compare binary in test-runner image - Uses telco-kpis-test-runner:latest container **Artifact Structure:** - Reports written to /workspace/reports/podman-runs/{spoke}/ - Dual mount paths for compatibility with report generator Related: Telco-KPIs configuration validation Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>

openshift-ci · 2026-05-06T19:01:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign shaior for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ccardenosa and others added 12 commits May 6, 2026 20:20

openshift-ci Bot requested review from cplacani and eifrach May 6, 2026 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/ipa telco kpis prow migration rds compare test#462

Refactor/ipa telco kpis prow migration rds compare test#462
ccardenosa wants to merge 12 commits intoopenshift-kni:mainfrom
ccardenosa:refactor/ipa-telco-kpis-prow-migration-rds-compare-test

ccardenosa commented May 6, 2026

Uh oh!

openshift-ci Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ccardenosa commented May 6, 2026

Uh oh!

openshift-ci Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant