Skip to content
This repository was archived by the owner on Apr 17, 2023. It is now read-only.

Latest commit

 

History

History
114 lines (84 loc) · 5.1 KB

SOP-operator.adoc

File metadata and controls

114 lines (84 loc) · 5.1 KB

UnifiedPush Operator - Standard Operating Procedures

Overview

The following guide outlines the steps required to manage and solve issues in the UnifiedPush Operator.

Success Indicators

All alerts should appear as green in the Prometheus Alert Monitoring.

Prometheus Alerts Procedures

The following guide outlines the steps required to resolve issues with the Prometheus alerts configured to monitor the UnifiedPush Operator. The alerts are configured by enabling the Monitoring Service (Metrics) for this project.

Critical

UnifiedPushOperatorDown

  1. Switch to the UnifiedPush namespace by running oc project <namespace>. E.g oc project unifiedpush.

  2. Follow the Validate steps.

    ℹ️
    The operator is responsible to manage and create all objects required in order to have the UnifiedPush Service and its database provided in the cluster.
  3. Check its logs by running oc logs <operator-podname>

    ℹ️
    You can save the logs by running oc logs <operator-podname> > <filename>.log. The logs may provide you with useful information to lead you to the root cause, and they are also useful for providing to the project maintainers when you create an issue.

Validate

  1. Switch to the UnifiedPush namespace by running oc project <namespace>. E.g oc project unifiedpush.

  2. Check if the operator pod to is present by running oc get pods | grep operator. Following an example of the expected result.

    $ oc get pods | grep operator
    unifiedpush-operator-58c8877fd8-g6dvr          1/1       Running   3          9d

Monitoring

If the Monitoring Service (Metrics) is enabled for the installation, a Grafana Dashboard titled UnifiedPush Operator, and the Prometheus Monitoring instance are created.

AMQ Integration

There is a variable that is responsible for switching AMQ usage in the unifiedpushserver custom resource:

apiVersion: push.aerogear.org/v1alpha1
kind: UnifiedPushServer
metadata:
  name: unifiedpush
spec:
  useMessageBroker: false #set this value to true and update the cr

This will trigger the creation of some AMQ resources: address, addressspace and messagingusers.

Sometimes AMQ adresses might get stuck in a pending state:

$ oc get address -n mobile-unifiedpush
NAME                                      ADDRESS                               READY     PHASE     AGE
ups.allbatchesloadedqueue                 AllBatchesLoadedQueue                 true      Active    25m
ups.apnspushmessagequeue                  APNsPushMessageQueue                  false     Pending   25m
ups.apnstokenbatchqueue                   APNsTokenBatchQueue                   false     Pending   25m
ups.batchloadedqueue                      BatchLoadedQueue                      true      Active    25m
ups.freeserviceslotqueue                  FreeServiceSlotQueue                  true      Active    25m
ups.gcmpushmessagequeue                   GCMPushMessageQueue                   true      Active    25m
ups.gcmtokenbatchqueue                    GCMTokenBatchQueue                    true      Active    25m
ups.metricsqueue                          MetricsQueue                          true      Active    25m
ups.triggermetriccollectionqueue          TriggerMetricCollectionQueue          true      Active    25m
ups.triggervariantmetriccollectionqueue   TriggerVariantMetricCollectionQueue   true      Active    25m
ups.webpushmessagequeue                   WebPushMessageQueue                   true      Active    25m
ups.webtokenbatchqueue                    WebTokenBatchQueue                    true      Active    25m
ups.wnspushmessagequeue                   WNSPushMessageQueue                   true      Active    25m
ups.wnstokenbatchqueue                    WNSTokenBatchQueue                    true      Active    25m

UPS needs to be rolled out again once all adresses are ready:

oc rollout latest unifiedpush -n openshift-mobile-unifiedpush

The rollout status can be verified/watched by running the following command:

oc rollout status dc/unifiedpush -n openshift-mobile-unifiedpush

If an address gets stuck for some time, it should be deleted by running oc delete address/$name -n mobile-unifiedpush to trigger an adress recreation by the operator.

ℹ️
This process won’t migrate messages from the internal UPS system to AMQ and results in a tiny UPS downtime (if no unexpected issues show up such as image pulling issues).