Skip to content

Conversation

@fmount
Copy link
Contributor

@fmount fmount commented Jan 9, 2026

This PR adds comprehensive OMC compatibility to openstack-must-gather, enabling users to analyze must-gather data using the omc tool while maintaining full backward compatibility with existing workflows.

Problem Statement

The current openstack-must-gather creates a custom directory structure optimized for OpenStack troubleshooting, but this structure is incompatible with standard Kubernetes analysis tools like OMC based on this standard format.
So far users had to choose between:

  • OpenStack optimized structure (good for manual analysis but incompatible with omc)
  • Standard Kubernetes tooling (omc compatibility, but suboptimal for OpenStack workflows)

Solution Architecture

The solution proposed by this commit introduces an omc compatibility layer that can be enabled while performing the gathering action.

Collection Modes

The must-gather execution mode can be driven through the new OMC environment variable.

  • Regular Mode (OMC=false or unset): Maintains existing OpenStack-optimized structure
  • OMC Mode (OMC=true): Creates standard Kubernetes directory structure based on oc adm inspect

Key Design Decisions

  1. Centralized Compatibility Layer (omc.sh):

    • All OMC logic isolated in single file
    • Uses oc adm inspect for standard structure generation
    • Clean separation from existing collection scripts
  2. No Modification of Existing Scripts:

    • Regular collection scripts remain unchanged when OMC=false
    • Backward compatibility guaranteed
    • No performance impact on existing workflows
  3. Comprehensive Resource Coverage:

    • OpenStack resources (CRDs, custom resources, secrets)
    • Network resources (nncp, nnce, nns, net-attach-def, metallb)
    • Monitoring resources (grafana, observability)
    • OLM resources (subscriptions, CSVs, installplans)
  4. Unified CRD Discovery:

    • Global CRD_DOMAINS variable in common.sh
    • Consistent regex pattern across all collection functions
    • Easy to extend for new resource types

Implementation Highlights

  • Complete Feature Parity: Both modes collect identical resources with proper secret masking
  • Optimized Performance: Eliminated grep | awk patterns in favor of pure awk
  • Clean Architecture: Single responsibility principle applied throughout
  • Maintainable Code: Changes affect minimal surface area

Usage example

  oc adm must-gather \
    --image=quay.io/openstack-k8s-operators/openstack-must-gather \
    --dest-dir=/home/stack/must-gather-omc \
    -- SOS= SOS_SERVICES= OMC=true OPENSTACK_DATABASES=ALL gather
    
Analyzing with OMC

# Navigate resources using omc
omc get nodes
omc -n openstack get glance
omc -n openstack get pods
omc get net-attach-def
omc get nncp
omc get ipaddresspools

# Explore namespace structure
omc describe namespace openstack

Regular Collection remains unchanged

oc adm must-gather \
    --image=quay.io/openstack-k8s-operators/openstack-must-gather \
    --dest-dir=$PWD/must-gather-regular

@openshift-ci openshift-ci bot requested review from dprince and juliakreger January 9, 2026 11:54
@openshift-ci
Copy link

openshift-ci bot commented Jan 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fmount for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-must-gather for 124,8ca20c99a3df5f72edd583475ff0df8786ed89fc

@fmount
Copy link
Contributor Author

fmount commented Jan 9, 2026

@fmount fmount requested review from abays and stuggi January 9, 2026 16:02
@fmount fmount marked this pull request as ready for review January 9, 2026 16:02
@openshift-ci openshift-ci bot requested a review from olliewalsh January 9, 2026 16:02
source "${DIR_NAME}/bg.sh"

# OMC compatibility mode
export OMC=${OMC:-false}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if this mode should be the default? support would otherwise always have to request it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, long term it would be ideal, especially because as you mentioned the request comes directly from the field. I wasn't sure about suddenly introduce a new default format that our team might not be aware of. Therefore, we could switch it to true and make it the default behavior and at the same time deprecate the old format, but it would be easier to coordinate such switch w/ the CI team (cifmw, forge-ci and so forth) to make sure we properly announce it and we don't get massive support requests to learn more the new format.
I would do the following things here:

  1. temporarily switch it to true in a new PS so we can see it executed in CI
  2. analyze any potential gap and see if we have all the usual information we need
  3. revert it to false and land this patch
  4. have a follow up patch where:
    a. we switch it to true
    b. we provide the required documentation to make life easier to the CI team
    c. we deprecate the "old" way of collecting info

After this work is done, we can plan a downstream doc update to reflect the new behavior.
Let me know your thoughts, in the meantime I'll move to step 1 of this list so we can early identify any potential gap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuggi step 1 is done, I'm quite happy with the results [1] and seems the right direction to evolve the tool. Let me know your thoughts about that.
Do you think we should hold this patch and plan to switch the default behavior as part of the main plan?
(also cc @abays ^ for additional thoughts)

[1] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming there's parity between the overall resources collected in OMC versus the old paradigm, I'd be quite happy to move to OMC as the default. I think we should give CI/QE a heads-up though in case they are somehow dependent on the current directory structure.

Copy link
Contributor Author

@fmount fmount Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to default it to true and give a heads-up to the CI/QE team to the new structure: this seems a good direction and I'm ok to switch to this new format as well.

This commit introduces comprehensive OMC integration for openstack-must-gather,
enabling seamless compatibility with the omc tool while maintaining full
backward compatibility with existing workflows.

Key architectural changes and decisions:
- Centralized OMC compatibility layer in omc.sh and based on "oc adm inspect"
- Clean separation between regular and OMC collection modes via OMC environment
  variable
- Unified CRD discovery pattern across all scripts via global variable
- Complete resource coverage including network, monitoring, and OLM resources

The implementation uses a dual-mode approach where OMC=true triggers
collection using oc adm inspect for standard Kubernetes directory structure,
while regular mode continues using individual oc get commands for
OpenStack-optimized organization.

Enhanced features:
- Network resource support (nncp, nnce, nns, net-attach-def, metallb)
- Comprehensive CRD coverage via centralized regex pattern
- Proper secret decoding and masking in both modes
- Optimized command execution by moving from "grep|awk" to awk conversions

Co-Authored-By: Claude <[email protected]>

Signed-off-by: Francesco Pantano <[email protected]>
@stuggi
Copy link
Contributor

stuggi commented Jan 12, 2026

I first thought we lost details on the secrets on the default gathering, since e.g. [1]

00-default.conf: MTM2NCBieXRlcyBsb25n

only has the info on the bytes long. But it seems we collect them also in [2].

But we do not collect the CMs like that? [3]. I think we miss [4](from an openstack-op pr), or I have not yet found it?

[1] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/core/secrets.yaml
[2] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/secrets/
[3] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/core/configmaps.yaml
[4] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_openstack-operator/1765/pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/2010398476197695488/artifacts/openstack-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-2af7f286b6453522975b5de70b41aecb541c915047854f0e78afc578e250b844/namespaces/openstack/configmaps/

@fmount
Copy link
Contributor Author

fmount commented Jan 12, 2026

I first thought we lost details on the secrets on the default gathering, since e.g. [1]

00-default.conf: MTM2NCBieXRlcyBsb25n

only has the info on the bytes long. But it seems we collect them also in [2].

Right, the default behavior when must-gather inspects a secret is to omit data [1] and only give an info about the size of the field.
Because in openstack we need to inspect the config files stored in the secrets, we continue to run our version of gather_secret that takes care about unpacking and masking the resulting config files and the associated sensitive fields.

But we do not collect the CMs like that? [3]. I think we miss [4](from an openstack-op pr), or I have not yet found it?

As you mentioned, ConfigMaps are collected in [2] (list of configmaps) as part of the regular namespace inspection [3]. Looking at the list produced by the "old" method, the main difference is having configmaps presented in a different structure because they are gathered per service. Clearly the old approach is easier to read as a log, but I thought that if we rely on the new inspection method it might result redundant collect them twice.
So you're right in the sense that we do not collect configmaps individually (or per service) like we did before. Do you feel like this is a limitation? From a functional perspective there's no gap in the gathering itself, only in the way they are presented. Because ConfigMaps might provide scripts and other config files, based on the feedback we could consider inspecting them via the old approach, though we pay in performances since due to [3] the collection happens twice.

[1] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/core/secrets.yaml [2] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/secrets/ [3] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/core/configmaps.yaml [4] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_openstack-operator/1765/pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/2010398476197695488/artifacts/openstack-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-2af7f286b6453522975b5de70b41aecb541c915047854f0e78afc578e250b844/namespaces/openstack/configmaps/

[1] https://github.com/openshift/oc/blob/main/pkg/cli/admin/inspect/secret.go#L71
[2] https://logserver.rdoproject.org/edf/rdoproject.org/edf636d30a9e4bc9984e33c60dbaec96/controller/ci-framework-data/logs/openstack-must-gather/quay-rdoproject-org-openstack-k8s-operators-openstack-must-gather-sha256-9e5fec8f5beccbd7d0ea0b9b0a67bdda92f8c618105087c918866dfc75cd35a8/namespaces/openstack/core/configmaps.yaml
[3] https://github.com/openshift/oc/blob/main/pkg/cli/admin/inspect/namespace.go#L23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants