This file provides project context for Claude-based development environments (Claude Code, claude.ai, etc.).
For Cursor IDE users, see .cursor/rules/ for auto-loaded context rules.
For non-Cursor environments: See .cursor/prompts/ for task-specific guides:
run-tests.md- Run pytest with various optionstroubleshoot-tests.md- Diagnose test failuresconnect-cluster.md- Set up OpenShift cluster accessdeploy-chart.md- Deploy the Helm chartcheck-logs.md- View component logsdebug-e2e.md- Debug E2E test failuresdownload-ci-artifacts.md- Download CI artifacts from Prow/GCS
This is a Helm chart for deploying Red Hat Cost Management on-premise (cost-onprem). It includes comprehensive pytest-based testing infrastructure.
cost-onprem/- Helm chart templates and valuestests/- Pytest test suitescripts/- Deployment and testing scriptsdocs/- Documentation
- Python 3.10+ (CI uses Python 3.11)
- OpenShift CLI (
oc) with cluster access - Helm 3.x
gcloudCLI (for downloading CI artifacts)
- Python: Follow PEP 8, use type hints where helpful
- Bash: Use
set -euo pipefail, quote variables - YAML: 2-space indentation for Helm templates
- Tests: Use descriptive names, include docstrings
In OpenShift CI, tests are executed via the insights-onprem-cost-onprem-chart-e2e step:
CI Step Registry: insights-onprem/cost-onprem-chart/e2e/
├── insights-onprem-cost-onprem-chart-e2e-commands.sh # Main CI script
├── insights-onprem-cost-onprem-chart-e2e-ref.yaml # Step definition
CI Execution Sequence:
- Dependencies: Installs yq, kubectl, helm, oc
- S4 Setup: Reads config from
insights-onprem-s4-deploystep - Cost Management Metrics Operator: Installs via OLM (stable channel)
- Helm Wrapper: Injects S4 storage config for cost-onprem chart
- Deploy & Test: Runs
scripts/deploy-test-cost-onprem.sh --namespace cost-onprem --verbose
Default CI Test Run:
# What CI executes (via deploy-test-cost-onprem.sh):
NAMESPACE=cost-onprem ./scripts/run-pytest.sh
# Equivalent to:
pytest -m "not extended" --junit-xml=reports/junit.xmlCI runs ~88 tests in ~3 minutes (excludes extended tests that require ODF/S3).
# Run all tests (including UI)
NAMESPACE=cost-onprem ./scripts/run-pytest.sh
# Run all tests except UI
NAMESPACE=cost-onprem ./scripts/run-pytest.sh --no-ui
# Specific suites
./scripts/run-pytest.sh --helm
./scripts/run-pytest.sh --auth
./scripts/run-pytest.sh --e2e
./scripts/run-pytest.sh --infrastructure
./scripts/run-pytest.sh --ros
./scripts/run-pytest.sh --uicomponent- Single-component testsintegration- Multi-component testsextended- Long-running tests (skipped by default in CI)smoke- Quick validation tests
E2E_CLEANUP_BEFORE=true # Clean before tests (default)
E2E_CLEANUP_AFTER=true # Clean after tests (default)
E2E_RESTART_SERVICES=false # Restart Valkey/listener (optional)IMPORTANT: The chart uses app.kubernetes.io/component for pod selection, NOT app.kubernetes.io/name.
| Component | Label Selector |
|---|---|
| Database | app.kubernetes.io/component=database |
| Ingress | app.kubernetes.io/component=ingress |
| Kruize | app.kubernetes.io/component=ros-optimization |
| ROS API | app.kubernetes.io/component=ros-api |
| ROS Processor | app.kubernetes.io/component=ros-processor |
| Cache (Valkey) | app.kubernetes.io/component=cache |
| Koku Listener | app.kubernetes.io/component=listener |
| MASU | app.kubernetes.io/component=cost-processor |
| Celery Workers | app.kubernetes.io/component=cost-worker |
# Check pod status
kubectl get pods -n cost-onprem -l app.kubernetes.io/instance=cost-onprem
# View logs by component
kubectl logs -n cost-onprem -l app.kubernetes.io/component=listener --tail=100
kubectl logs -n cost-onprem -l app.kubernetes.io/component=ros-processor --tail=100Tests use app.kubernetes.io/component=database label selector.
kubectl get pods -n cost-onprem -l app.kubernetes.io/component=databaseRoot Cause: boto3 defaults to virtual-hosted style URLs which don't work with NooBaa/Ceph RGW. Fix: Chart configures boto3 for path-style S3 addressing via:
cost-onprem-aws-configConfigMap (setsaddressing_style = path)AWS_CONFIG_FILE=/etc/aws/configenvironment variable
Symptom: "x509: certificate signed by unknown authority" or CSV parse errors.
Root Cause: Go's x509.SystemCertPool() doesn't include OpenShift service CA.
Fix: Chart uses initContainer.prepareCABundle to combine CAs.
Root Cause: manifest.json must include start and end date fields.
Fix: tests/utils.py create_upload_package() includes these fields.
Possible Causes:
- TLS certificate issues
- S3 URL encoding issues
- Wrong data format - NISE must use
--ros-ocp-infoflag
Verification:
kubectl logs -n cost-onprem -l app.kubernetes.io/component=ros-processor --tail=50Root Cause: Label changes require fresh install.
helm uninstall cost-onprem -n cost-onprem
helm install cost-onprem ./cost-onprem -n cost-onprem -f openshift-values.yaml --waitThe E2E tests use NISE (koku-nise) to generate proper OCP cost data.
nise report ocp --ros-ocp-info --static-report-file static.yml --write-monthly--ros-ocp-info- Generate container-level data for ROS processor (REQUIRED for ROS tests)--write-monthly- Organize output by month--static-report-file- Use predefined workload configuration
Upload tarball must have proper manifest.json:
{
"uuid": "...",
"cluster_id": "...",
"version": "...",
"date": "2026-01-22",
"start": "2026-01-20",
"end": "2026-01-22",
"files": ["pod_usage.csv"],
"resource_optimization_files": ["ros_usage.csv"]
}filesarray: Pod-level data for Koku processingresource_optimization_filesarray: Container-level data for ROS processor- Both
startandenddates are REQUIRED for summary table population
To connect to an OpenShift cluster for testing/troubleshooting, you need:
- Cluster API URL: e.g.,
api.ocp-edge94.qe.lab.redhat.com:6443 - Username: e.g.,
kubeadmin - Password: The cluster admin password
oc login -s <CLUSTER_API_URL> -u <USERNAME> --password <PASSWORD>oc whoami # Check logged in user
oc cluster-info # Check cluster info
kubectl get pods -n cost-onprem # Check deploymentexport NAMESPACE="cost-onprem" # Target namespace
export KEYCLOAK_NAMESPACE="keycloak" # Keycloak namespace
export HELM_RELEASE_NAME="cost-onprem" # Helm release name# CI mode (~88 tests, ~3 min)
NAMESPACE=cost-onprem ./scripts/run-pytest.sh
# Extended tests (~15 min, requires ODF)
NAMESPACE=cost-onprem ./scripts/run-pytest.sh --extended
# Specific suite
./scripts/run-pytest.sh --e2e# Koku listener
kubectl logs -n cost-onprem -l app.kubernetes.io/component=listener --tail=100
# ROS processor
kubectl logs -n cost-onprem -l app.kubernetes.io/component=ros-processor --tail=100
# Kruize
kubectl logs -n cost-onprem -l app.kubernetes.io/component=ros-optimization --tail=100# Full deployment + chart tests
./scripts/deploy-test-cost-onprem.sh --namespace cost-onprem --verbose
# Deploy only — skip chart tests
./scripts/deploy-test-cost-onprem.sh --skip-chart-tests
# Chart tests only (skip deployment)
./scripts/deploy-test-cost-onprem.sh --skip-deploy
# Dry run to preview what would execute
./scripts/deploy-test-cost-onprem.sh --dry-run --verboseAfter modifying flag parsing in deploy-test-cost-onprem.sh, validate all permutations:
./scripts/qe/test-gh-workflow-locally.sh .github/workflows/validate-deploy-test-script.yml# All pods
kubectl get pods -n cost-onprem -l app.kubernetes.io/instance=cost-onprem
# Recent events
kubectl get events -n cost-onprem --sort-by='.lastTimestamp' | tail -20
# Search for errors
kubectl logs -n cost-onprem -l app.kubernetes.io/component=ros-processor | grep -i error# Download from Prow URL
./scripts/download-ci-artifacts.sh --url "<PROW_URL>"
# Download by PR and build ID
./scripts/download-ci-artifacts.sh <PR_NUMBER> <BUILD_ID>Note: Downloaded artifacts are saved to ci-artifacts-pr<PR>-<BUILD_ID>/ and should NOT be deleted unless explicitly requested by the user.
IQE (Insights QE) tests provide comprehensive integration testing for cost-management functionality.
- Red Hat Network: Must be on VPN for repository access
- Quay.io Access: For containerized tests, need access to
quay.io/cloudservices/iqe-tests- Requires user file in
app-interfacerepo:data/teams/insights/users/<username>.yml
- Requires user file in
- Local Repositories: For local tests, clone adjacent to this repo:
../iqe-core/ ../iqe-cost-management-plugin/
# IQE only — skip deploy + chart tests, boost listener CPU (recommended)
./scripts/deploy-test-cost-onprem.sh --iqe-only \
--listener-cpu max --iqe-profile smoke
# Containerized standalone (no CPU boost)
./scripts/run-iqe-tests.sh --profile smoke
# Local (requires VPN + local repos)
./scripts/run-iqe-tests-local.sh --setup # First time
./scripts/run-iqe-tests-local.sh --profile smoke # Run tests| Profile | Tests | Duration | Use Case |
|---|---|---|---|
smoke |
~43 | ~17 min | PR checks |
extended |
~2100 | ~33 min | Daily CI |
stable |
~2350 | ~40 min | Weekly CI |
full |
~3324 | ~60 min | Release validation |
Tests are I/O-bound waiting for backend data processing. Use --listener-cpu max
to boost the listener deployment's CPU during the run (~40-50% faster ingestion).
Tests are organized into skip groups with SKIP_* env vars. Key blockers:
- COST-7179 — GPU/MIG schema mismatch blocks ~90 tests + cascading failures
- 90-day data — NISE generates ~60 days; 90-day range tests fail (~228 tests)
- FLPATH-3423 — Source CRUD update returns 500 (1 test)
See docs/development/skipped-iqe-tests.md for full details on all skip groups,
pytest markers, and test profiles.
See docs/development/test-impact-map.md for the automated test recommendation
system that maps component changes to IQE profiles (scripts/qe/test-impact-map.yaml).
See docs/development/iqe-testing-setup.md for full setup guide.