Skip to content

feat(telco-kpis): Add Gitea role for test report publishing#451

Open
ccardenosa wants to merge 4 commits intoopenshift-kni:mainfrom
ccardenosa:refactor/ipa-telco-kpis-prow-migration-gitea-deployment
Open

feat(telco-kpis): Add Gitea role for test report publishing#451
ccardenosa wants to merge 4 commits intoopenshift-kni:mainfrom
ccardenosa:refactor/ipa-telco-kpis-prow-migration-gitea-deployment

Conversation

@ccardenosa
Copy link
Copy Markdown
Collaborator

Implement Gitea deployment and report publishing infrastructure for hosting Telco-KPIs test reports with vault integration and retention policies.

Problem

Telco-KPIs test reports need centralized hosting accessible to test engineers and stakeholders. Reports generated on bastion hosts need automated publishing to a Git-based repository system with retention management.

Solution

Created gitea Ansible role for deploying Gitea server and publishing test reports as Markdown files with compressed artifacts.

Role: playbooks/telco-kpis/roles/gitea/

Features

Deployment:

  • Podman-based Gitea server deployment on bastion
  • SQLite database with automatic migration handling
  • Firewall configuration (port 3000)
  • Accessibility checking instead of container existence
  • Admin user creation with API token management

Report Publishing:

  • Creates organization and repositories automatically
  • Publishes Markdown reports via Gitea API
  • Uploads compressed tarball as release artifact
  • Updates repository README with latest report links
  • Retention policy: keeps last 15 reports, removes older ones

Vault Integration:

  • Gitea credentials stored in Ansible vault
  • Secure API token management
  • Credential validation before operations

Implementation Details

Role Structure:

  • tasks/main.yml - Entry point with operation dispatch
  • tasks/deploy.yml - Gitea server deployment
  • tasks/initialize.yml - Initial configuration and admin setup
  • tasks/validate-credentials.yml - Vault credential validation
  • tasks/create-repository.yml - Repository creation
  • tasks/publish-report.yml - Report publishing workflow
  • defaults/main.yml - Default variables
  • templates/README.md.j2 - Repository README template

Task Operations:

  • gitea_operation: deploy - Deploy and initialize Gitea server
  • gitea_operation: publish - Publish test report
  • gitea_operation: validate - Validate vault credentials

Usage

Deploy Gitea:

- name: Deploy Gitea server
  ansible.builtin.include_role:
    name: gitea
  vars:
    gitea_operation: deploy
    gitea_vault_org: telco-kpis

Publish report:

- name: Publish test report
  ansible.builtin.include_role:
    name: gitea
  vars:
    gitea_operation: publish
    gitea_vault_org: telco-kpis
    gitea_vault_repo: hlxcl7-reports
    gitea_report_file: /path/to/report.md
    gitea_artifact_file: /path/to/artifacts.tar.gz

Key Features

Firewall Management:

  • Detects firewalld vs. iptables
  • Adds port 3000 rule if not present
  • Handles both firewall backends

Database Migration:

  • Waits for database initialization on first run
  • Handles migration errors gracefully
  • Retries admin user creation after migration

Repository Retention:

  • Keeps last 15 reports per repository
  • Automatically deletes older reports
  • Prevents unbounded repository growth

Error Handling:

  • Comprehensive API error checking
  • Retries for transient failures
  • Detailed error messages

Benefits

  • Centralized test report hosting
  • Automated report publishing workflow
  • Secure credential management via vault
  • Retention policy prevents storage bloat
  • Accessible web UI for stakeholders
  • Git-based versioning of reports

Integration

Used by playbooks/telco-kpis/generate-report.yml to publish aggregated test reports from all Telco-KPIs tests (node-info, BIOS validation, performance tests, deployment timeline).

Related: Telco-KPIs test infrastructure, report generation system

@openshift-ci openshift-ci Bot requested review from cplacani and rdiscala May 6, 2026 14:04
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign shaior for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +4 to +12
- name: Check if Gitea is accessible via localhost
ansible.builtin.uri:
url: "http://localhost:{{ gitea_http_port }}/"
method: GET
status_code: 200
validate_certs: false
register: gitea_accessible
failed_when: false
changed_when: false
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this it's better to add a small retry incase there is some network latency or issue

Comment on lines +109 to +110
podman run -d
--name {{ gitea_container_name }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
podman run -d
--name {{ gitea_container_name }}
podman run -d --rm
--name {{ gitea_container_name }}

@ccardenosa ccardenosa force-pushed the refactor/ipa-telco-kpis-prow-migration-gitea-deployment branch from 60e08de to 1d48ee0 Compare May 6, 2026 14:16
ccardenosa added 4 commits May 6, 2026 20:20
Implement role-based container image mirroring system for internal registry
management, supporting both mirror and removal operations with authentication.

## Problem

Telco-KPIs testing requires mirroring container images to internal registries
for disconnected environments and test image management. Previous approach used
inline playbook tasks without reusability.

## Solution

Created dedicated `container_image_mirror` Ansible role with playbooks for both
mirroring and removal operations:

**Role: playbooks/roles/container_image_mirror/**
- Supports 'mirror' and 'remove' operations via parameter
- Uses skopeo for image operations
- Handles authentication with pull secrets
- Continues operation even if some images fail
- Comprehensive success/failure reporting with summary

**Playbooks:**
- `playbooks/mirror-images.yml` - Mirror images to internal registry
- `playbooks/remove-images.yml` - Remove images from registry storage

## Features

**Authentication:**
- Pull secret support for private registries (via pull_secret_string or pull_secret_path)
- System default auth when no pull secret provided
- Configurable auth file location (/tmp for bastion compatibility)
- use_pull_secret flag to control authentication method

**Registry Configuration:**
- Configurable registry host/port/namespace
- TLS verification control
- Source and destination registry support

**Operations:**
- Idempotent with existence checks
- Detailed mirror/removal summary
- Error handling continues operation on failures

## Usage

**Mirror images:**
```bash
ansible-playbook playbooks/mirror-images.yml \
  -e images='[{"source": "quay.io/image:tag", "dest": "registry.local/namespace/image:tag"}]' \
  -e registry_host=registry.local \
  -e pull_secret_string='{"auths": {...}}'
```

**Remove images:**
```bash
ansible-playbook playbooks/remove-images.yml \
  -e images='[{"dest": "registry.local/namespace/image:tag"}]' \
  -e registry_host=registry.local
```

## Implementation Details

**Role Structure:**
- `defaults/main.yaml` - Default variables
- `tasks/main.yaml` - Entry point with operation dispatch
- `tasks/mirror.yaml` - Mirror images using skopeo
- `tasks/remove.yaml` - Remove images from registry storage
- `meta/main.yaml` - Role metadata
- `README.md` - Comprehensive documentation

**Key Variables:**
- `container_image_mirror_operation`: "mirror" or "remove"
- `container_image_mirror_images`: List of image objects
- `container_image_mirror_registry_host`: Target registry hostname
- `container_image_mirror_pull_secret_string`: JSON pull secret
- `container_image_mirror_use_pull_secret`: Enable/disable authentication

## Benefits

- Reusable role for both mirror and removal operations
- Cleaner separation of concerns
- Easier to test and maintain
- Follows eco-ci-cd role patterns (like ocp_operator_mirror)
- Well-documented with examples
- Jenkins job compatible (uses same variable names)

## Jenkins Integration

Used by `telco-kpis-mirror-ran-test-images` Jenkins job for mirroring RAN test
images to internal registries.

Related: Telco-KPIs test infrastructure
Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>
Add parse-lockdown.yml playbook that extracts deployment parameters from
lockdown JSON files, enabling decoupled parameter management for reproducible
deployments.

## Problem

Telco-KPIs testing requires exact software versions (OCP releases, operator
channels, catalogs) for reproducible deployments. Parsing lockdown JSON inline
within deployment playbooks creates tight coupling and makes parameter reuse
across multiple jobs difficult.

## Solution

Implement standalone parser playbook that runs before deployment jobs:
1. Downloads and parses lockdown JSON from URI
2. Auto-detects lockdown type (hub vs spoke) from JSON structure
3. Extracts deployment parameters
4. Outputs in shell env and JSON formats for downstream consumption

## Changes

**New playbook: playbooks/telco-kpis/parse-lockdown.yml**
- Auto-detection logic: checks for 'hub' vs 'deployment' key in JSON
- Hub parsing: extracts OCP_RELEASE_IMAGE, ACM_CHANNEL, MCE_CHANNEL, catalogs
- Spoke parsing: extracts OCP_PULL_SPEC, ZTP_PULL_SPEC, operator configurations
- SSL certificate bypass for internal GitLab instances (validate_certs: false)
- Dynamic artifact naming using lockdown filename from URI
- Outputs three artifacts per run:
  - `{lockdown-name}.json`: Original lockdown file
  - `{lockdown-name}-params.env`: Shell environment variables
  - `{lockdown-name}-params.json`: Structured JSON parameters

**New role: playbooks/telco-kpis/roles/lockdown_hub_config/**
- tasks/main.yml: Download, validate, and parse hub lockdown JSON
- defaults/main.yml: Default configuration values
- README.md: Comprehensive role documentation
- Used by both parse-lockdown.yml and deploy-ocp-operators.yml

## Usage Workflow

**Step 1: Parse lockdown file**
```bash
ansible-playbook playbooks/telco-kpis/parse-lockdown.yml \
  -e lockdown_uri=https://gitlab.cee.redhat.com/.../lockdown-hub-x86_64.json
```

**Step 2: Use extracted parameters**
```bash
# Source env file
source lockdown-hub-x86_64-params.env

# Use in deployment
ansible-playbook playbooks/deploy-ocp-sno.yml \
  -e release="${OCP_RELEASE_IMAGE}"
```

## Benefits

**Decoupling:**
- Parsing separated from deployment logic
- Parameters extracted once, reused across multiple jobs
- Easier debugging with explicit parameter artifacts

**Flexibility:**
- Supports multiple lockdown formats (hub, spoke, baseline)
- Self-documenting artifacts with actual lockdown names
- Both shell and JSON output formats

**Prow-ready:**
- Clean separation aligns with Prow step registry architecture
- Parser step can run independently, output shared via SHARED_DIR

## Key Features

**Auto-detection:**
```yaml
lockdown_type: "{{ 'hub' if ('hub' in lockdown_data) else 'spoke' }}"
```

**Dynamic artifact naming:**
```yaml
lockdown_filename: "{{ lockdown_uri | regex_replace('.*/', '') | regex_replace('.json$', '') }}"
# Result: lockdown-hub-x86_64-params.env (not generic lockdown-params.env)
```

**Hub channel transformations:**
```yaml
hub_acm_channel: "release-{{ lockdown_data.hub.acm.version_override }}"
hub_mce_channel: "{{ lockdown_data.hub.acm.mce_override | regex_replace('^v', 'stable-') | regex_replace('\\.\\d+$', '') }}"
```

## Example Artifacts

**lockdown-hub-x86_64-params.env:**
```bash
LOCKDOWN_TYPE=hub
OCP_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.20.4-x86_64
OCP_VERSION=4.20
ACM_CHANNEL=release-2.13
MCE_CHANNEL=stable-2.8
TALM_CATALOG=quay.io/.../talm-index:v4.20
GITOPS_CATALOG=quay.io/.../gitops-index:v1.15
```

**lockdown-hub-x86_64-params.json:**
```json
{
  "LOCKDOWN_TYPE": "hub",
  "OCP_RELEASE_IMAGE": "quay.io/openshift-release-dev/ocp-release:4.20.4-x86_64",
  "OCP_VERSION": "4.20",
  "ACM_CHANNEL": "release-2.13",
  "MCE_CHANNEL": "stable-2.8",
  "TALM_CATALOG": "quay.io/.../talm-index:v4.20",
  "GITOPS_CATALOG": "quay.io/.../gitops-index:v1.15"
}
```

## Verification

Tested with both lockdown types:
- Hub lockdown: Successfully extracted OCP 4.20.4 pull spec and operator channels
- Spoke lockdown: Successfully extracted spoke deployment parameters

Artifacts correctly named with lockdown filename and contain expected parameters.

Related: Telco-KPIs reproducible deployment system
Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>
Add comprehensive troubleshooting guides for common Telco-KPIs deployment and
testing issues.

## Added Documentation

**playbooks/telco-kpis/docs/troubleshooting/**

1. **prometheus-pod-stuck-reboot-test-blocker.md**
   - Issue: Prometheus pod stuck in Init:0/1 causing reboot tests to skip
   - Root cause: Corrupted alertmanager-main-generated ConfigMap (OCPBUGS-65953, OCPBUGS-70352)
   - Impact: CNF-gotests BeforeEach health check fails
   - Workaround: Fix ConfigMap data and restart pod
   - Prevention: Automated fix option for post-deployment playbooks

2. **k8s-exec-ipv6-fallback-issue.md**
   - Issue: kubernetes.core.k8s_exec fails with "No route to host" in dual-stack environments
   - Root cause: Python websocket-client library doesn't fall back to IPv4
   - Impact: Blocks pod exec operations (BIOS/microcode collection, hardware info)
   - Solution: Use `oc exec` via ansible.builtin.shell instead
   - Verification: Tested on spree-02 cluster (2026-04-30)

## Benefits

- Faster troubleshooting with documented solutions
- Reduces repeated investigation of known issues
- Provides context (bug IDs, verification dates) for future reference
- Includes both workarounds and permanent solutions

Related: Telco-KPIs test infrastructure reliability
Implement Gitea deployment and report publishing infrastructure for hosting
Telco-KPIs test reports with vault integration and retention policies.

## Problem

Telco-KPIs test reports need centralized hosting accessible to test engineers
and stakeholders. Reports generated on bastion hosts need automated publishing
to a Git-based repository system with retention management.

## Solution

Created `gitea` Ansible role for deploying Gitea server and publishing test
reports as Markdown files with compressed artifacts.

**Role: playbooks/telco-kpis/roles/gitea/**

## Features

**Deployment:**
- Podman-based Gitea server deployment on bastion
- SQLite database with automatic migration handling
- Firewall configuration (port 3000)
- Accessibility checking instead of container existence
- Admin user creation with API token management

**Report Publishing:**
- Creates organization and repositories automatically
- Publishes Markdown reports via Gitea API
- Uploads compressed tarball as release artifact
- Updates repository README with latest report links
- Retention policy: keeps last 15 reports, removes older ones

**Vault Integration:**
- Gitea credentials stored in Ansible vault
- Secure API token management
- Credential validation before operations

## Implementation Details

**Role Structure:**
- `tasks/main.yml` - Entry point with operation dispatch
- `tasks/deploy.yml` - Gitea server deployment
- `tasks/initialize.yml` - Initial configuration and admin setup
- `tasks/validate-credentials.yml` - Vault credential validation
- `tasks/create-repository.yml` - Repository creation
- `tasks/publish-report.yml` - Report publishing workflow
- `defaults/main.yml` - Default variables
- `templates/README.md.j2` - Repository README template

**Task Operations:**
- `gitea_operation: deploy` - Deploy and initialize Gitea server
- `gitea_operation: publish` - Publish test report
- `gitea_operation: validate` - Validate vault credentials

## Usage

**Deploy Gitea:**
```yaml
- name: Deploy Gitea server
  ansible.builtin.include_role:
    name: gitea
  vars:
    gitea_operation: deploy
    gitea_vault_org: telco-kpis
```

**Publish report:**
```yaml
- name: Publish test report
  ansible.builtin.include_role:
    name: gitea
  vars:
    gitea_operation: publish
    gitea_vault_org: telco-kpis
    gitea_vault_repo: hlxcl7-reports
    gitea_report_file: /path/to/report.md
    gitea_artifact_file: /path/to/artifacts.tar.gz
```

## Key Features

**Firewall Management:**
- Detects firewalld vs. iptables
- Adds port 3000 rule if not present
- Handles both firewall backends

**Database Migration:**
- Waits for database initialization on first run
- Handles migration errors gracefully
- Retries admin user creation after migration

**Repository Retention:**
- Keeps last 15 reports per repository
- Automatically deletes older reports
- Prevents unbounded repository growth

**Error Handling:**
- Comprehensive API error checking
- Retries for transient failures
- Detailed error messages

## Benefits

- Centralized test report hosting
- Automated report publishing workflow
- Secure credential management via vault
- Retention policy prevents storage bloat
- Accessible web UI for stakeholders
- Git-based versioning of reports

## Integration

Used by `playbooks/telco-kpis/generate-report.yml` to publish aggregated test
reports from all Telco-KPIs tests (node-info, BIOS validation, performance
tests, deployment timeline).

Related: Telco-KPIs test infrastructure, report generation system
Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>
@ccardenosa ccardenosa force-pushed the refactor/ipa-telco-kpis-prow-migration-gitea-deployment branch from 1d48ee0 to 9a4b561 Compare May 6, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants