Guided Exercise: Application Health Probes
Configure health probes in a deployment and verify that network clients are insulated from application failures.
Outcomes
Observe potential issues with an application that is not configured with health probes.
Configure startup, liveness, and readiness probes for the application.
As the student
user on the workstation
machine, use the lab
command to prepare your system for this exercise.
This command ensures that the following conditions are true:
The
reliability-probes
project exists.The resource files are available in the course directory.
The classroom registry has the
long-load
container image.
The registry.ocp4.example.com:8443/redhattraining/long-load:v1
container image contains an application with utility endpoints. These endpoints perform such tasks as crashing the process and toggling the server's health status.
[student@workstation ~]$ lab start reliability-probes
Procedure 6.2. Instructions
As the
developer
user, deploy thelong-load
application in thereliability-probes
project.Log in as the
developer
user with thedeveloper
password.[student@workstation ~]$ oc login -u developer -p developer \ https://api.ocp4.example.com:6443 Login successful. ...output omitted...
Select the
reliability-probes
project.[student@workstation ~]$ oc project reliability-probes Now using project "reliability-probes" on server "https://api.ocp4.example.com:6443".
Navigate to the
DO180/labs/reliability-probes
directory and then apply thelong-load-deploy.yaml
file to create the pod. Move to the next step within one minute.[student@workstation ~]$ cd DO180/labs/reliability-probes
[student@workstation reliability-probes]$ oc apply -f long-load-deploy.yaml deployment.apps/long-load created service/long-load created route.route.openshift.io/long-load created
Verify that the pods take several minutes to start by sending a request to a pod in the deployment.
[student@workstation reliability-probes]$ oc exec deploy/long-load -- \ curl -s localhost:3000/health app is still starting
Observe that the pods are listed as ready even though the application is not ready.
[student@workstation reliability-probes]$ oc get pods NAME READY STATUS RESTARTS AGE long-load-8564d998cc-579nx 1/1 Running 0 30s long-load-8564d998cc-ttqpg 1/1 Running 0 30s long-load-8564d998cc-wjtfw 1/1 Running 0 30s
Add a startup probe to the pods so that the cluster knows when the pods are ready.
Modify the
long-load-deploy.yaml
YAML file by defining a startup probe. The probe runs every three seconds and triggers a pod as failed after 30 failed attempts. The file should match the following excerpt:...output omitted... spec: ...output omitted... template: ...output omitted... spec: containers: - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1 imagePullPolicy: Always name: long-load startupProbe: failureThreshold: 30 periodSeconds: 3 httpGet: path: /health port: 3000 env: ...output omitted...
Scale down the deployment to zero replicas.
[student@workstation reliability-probes]$ oc scale deploy/long-load --replicas 0 deployment.apps/long-load scaled
Apply the updated
long-load-deploy.yaml
file. Because the YAML file specifies the number of replicas, the deployment is scaled up. Move to the next step within one minute.[student@workstation reliability-probes] oc apply -f long-load-deploy.yaml deployment.apps/long-load configured service/long-load unchanged route.route.openshift.io/long-load configured
Observe that the pods do not show as ready until the application is ready and the startup probe succeeds.
[student@workstation reliability-probes]$ oc get pods NAME READY STATUS RESTARTS AGE long-load-785b5b4fc8-7x5ln 0/1 Running 0 27s long-load-785b5b4fc8-f7pdk 0/1 Running 0 27s long-load-785b5b4fc8-r2nqj 0/1 Running 0 27s
Add a liveness probe so that broken instances of the application are restarted.
Start the load test script. The test begins to print
Ok
as the pods become available.[student@workstation reliability-probes]$ ./load-test.sh app is still starting app is still starting app is still starting ...output omitted... Ok Ok Ok ...output omitted...
Keep the script running in a visible window.
In a new terminal window, use the
/togglesick
endpoint to make one of the pods unhealthy.[student@workstation reliability-probes]$ oc exec \ deploy/long-load -- curl -s localhost:3000/togglesick no output expected
The load test window begins to show
app is unhealthy
. Because only one pod is unhealthy, the remaining pods still respond withOk
.Update the
long-load-deploy.yaml
file to add a liveness probe. The probe runs every three seconds and triggers the pod as failed after three failed attempts. Modify thespec.template.spec.containers
object in the file to match the following excerpt.spec: ...output omitted... template: ...output omitted... spec: containers: - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1 ...output omitted... startupProbe: failureThreshold: 30 periodSeconds: 3 httpGet: path: /health port: 3000 livenessProbe: failureThreshold: 3 periodSeconds: 3 httpGet: path: /health port: 3000 env: ...output omitted...
Scale down the deployment to zero replicas.
[student@workstation reliability-probes]$ oc scale deploy/long-load --replicas 0 deployment.apps/long-load scaled
The load test script shows that the application is not available.
Apply the updated
long-load-deploy.yaml
file to update the deployment, which triggers the deployment to re-create its pods.[student@workstation reliability-probes]$ oc apply -f long-load-deploy.yaml deployment.apps/long-load configured service/long-load unchanged route.route.openshift.io/long-load configured
Wait for the load test window to show
Ok
for all responses, and then toggle one of the pods to be unhealthy.[student@workstation reliability-probes]$ oc exec \ deploy/long-load -- curl -s localhost:3000/togglesick no output expected
The load test window might show
app is unhealthy
a number of times before the pod is restarted.Observe that the unhealthy pod is restarted after the liveness probe fails. After the pod is restarted, the load test window shows only
Ok
.[student@workstation reliability-probes]$ oc get pods NAME READY STATUS RESTARTS AGE long-load-fbb7468d9-8xm8j 1/1 Running 0 9m42s long-load-fbb7468d9-k66dm 1/1 Running 0 8m38s long-load-fbb7468d9-ncxkh 0/1 Running 1 (11s ago) 10m
Add a readiness probe so that traffic goes only to pods that are ready and healthy.
Scale down the deployment to zero replicas.
[student@workstation reliability-probes]$ oc scale deploy/long-load --replicas 0 deployment.apps/long-load scaled
Use the
oc set probe
command to add the readiness probe.[student@workstation reliability-probes]$ oc set probe deploy/long-load --readiness \ --failure-threshold 1 --period-seconds 3 \ --get-url http://:3000/health deployment.apps/long-load probes updated
Scale up the deployment to three replicas.
[student@workstation reliability-probes]$ oc scale deploy/long-load --replicas 3 deployment.apps/long-load scaled
Observe the status of the pods by using a
watch
command.[student@workstation reliability-probes]$ watch oc get pods NAME READY STATUS RESTARTS AGE long-load-d5794d744-8hqlh 0/1 Running 0 48s long-load-d5794d744-hphgb 0/1 Running 0 48s long-load-d5794d744-lgkns 0/1 Running 0 48s
The command does not immediately finish, but continues to show updates to the pods' status. Leave this command running in a visible window.
Wait for the pods to show as ready. Then, in a new terminal window, make one of the pods unhealthy for five seconds by using the
/hiccup
endpoint.[student@workstation reliability-probes]$ oc exec \ deploy/long-load -- curl -s localhost:3000/hiccup?time=5 no output expected
The pod status window shows that one of the pods is no longer ready. After five seconds, the pod is healthy again and shows as ready.
The load test window might show
app is unhealthy
one time before the pod is set as not ready. After the cluster determines that the pod is no longer ready, it stops sending traffic to the pod until either the pod is fixed or the liveness probe fails. Because the pod is sick only for five seconds, it is enough time for the readiness probe to fail, but not the liveness probe.NOTE
Optionally, repeat this step and observe as the temporarily sick pod's status changes.
Stop the load test and status commands by pressing Ctrl+c in their respective windows. Return to the
/home/student/
directory.[student@workstation reliability-probes]$ cd /home/student/ [student@workstation ~]$
Finish
On the workstation
machine, use the lab
command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.
[student@workstation ~]$ lab finish reliability-probes