You can retrieve metadata from your OpenShift cluster.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
10.134.237.197 Ready master,worker 52d v1.19.0+e49167a
10.134.237.201 Ready master,worker 52d v1.19.0+e49167a
10.134.237.236 Ready master,worker 52d v1.19.0+e49167aMake sure everytime you create resources that you
- target the right OpenShift cluster
- target the right OpenShift Project / Kubernetes namespace
https://cloudnative101.dev/lectures/kube-observability/
Hint: make sure you select Show more on the code snippets on above site to see all content.
The first issue is caused by application instances entering an unhealthy state and responding to user requests with error messages. Unfortunately, this state does not cause the container to stop, so the Kubernetes cluster is not able to detect this state and restart the container. Luckily, the application has an internal endpoint that can be used to detect whether or not it is healthy. This endpoint is /healthz on port 8080.
Your first task will be to create a probe to check this endpoint periodically. If the endpoint returns an error or fails to respond, the probe will detect this and the cluster will restart the container.
Another issue is caused by new pods when they are starting up. The application takes a few seconds after startup before it is ready to service requests. As a result, some users are getting error message during this brief time.
To fix this, you will need to create another probe. To detect whether the application is ready, the probe should simply make a request to the root endpoint, /ready, on port 8080. If this request succeeds, then the application is ready. Also set a initial delay of 5 seconds for the probes.
Here is the Pod yaml file to be used for both scenarios. Add the probes, then create the pod in the cluster to test it.
apiVersion: v1
kind: Pod
metadata:
name: energy-shield-service
spec:
containers:
- name: energy-shield
image: ibmcase/energy-shield:1The folder /energy-shield in this Git repository includes the implementation used in the container, note that you won't need the node-js code for the lab. You should though take a look into it to get a deeper understanding of the component interactions.
$ oc describe pod/energy-shield-service | grep Readiness
Readiness: http-get http://:8080/ready delay=0s timeout=1s period=5s success=1 failure=3
$ oc describe pod/energy-shield-service | grep Liveness
Liveness: http-get http://:8080/healthz delay=0s timeout=1s period=10s success=1 failure=3
Create the following pod to experience what happens if a liveness probe fails. Delete it after your tests.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5You have different ways how to see the liveness probe failing.
$ oc get events --watch
$ oc describe pods/liveness-execDon't forget to delete the liveness-exec pod after your tests.