-
Build your own Docker images
-
Become familiar with lightweight process supervision for Docker
-
Understand core concepts for dynamic scaling of an application in production
-
Put into practice decentralized management of web server instances
This lab builds on a previous lab on load balancing.
In this lab you will perform a number of tasks and document your progress in a lab report. Each task specifies one or more deliverables to be produced. Collect all the deliverables in your lab report. Give the lab report a structure that mimics the structure of this document.
We expect you to have in your repository (you will get the instructions later
for that) a folder called report and a folder called logs. Ideally, your
report should be in Markdown format directly in the repository.
The lab consists of 6 tasks and one initial task (the initial task should be quick if you already completed the lab on load balancing):
- Identify issues and install the tools
- Add a process supervisor to run several processes
- Add a tool to manage membership in the web server cluster
- React to membership changes
- Use a template engine to easily generate configuration files
- Generate a new load balancer configuration when membership changes
- Make the load balancer automatically reload the new configuration
Remarks:
-
In your report reference the task numbers and question numbers of this document.
-
The version of HAProxy used in this lab is
2.2. When reading the documentation, make sure you are looking at this version. Here is the link: http://cbonte.github.io/haproxy-dconv/2.2/configuration.html -
In the report give the URL of the repository that you forked off this lab.
-
The images and the web application are a bit different from the lab on load balancing. The web app no longer requires a tag. An environment variable is defined in the Docker files to specify a role for each image. We will see later how to use that.
-
We expect, at least, to see in your report:
-
An introduction describing briefly the lab
-
Seven chapters, one for each task (0 to 6)
-
A table of content
-
A chapter named "Difficulties" where you describe the problems you have encountered and the solutions you found
-
A conclusion
-
DISCLAIMER: In this lab, we will go through one possible approach to manage a scalable infrastructure where we can add and remove nodes without having to rebuild the HAProxy image. This is not the only way to achieve this goal. If you do some research you will find a lot of tools and services to achieve the same kind of behavior.
In the previous lab, we built a simple distributed system with a load balancer and two web applications. The architecture of our distributed web application is shown in the following diagram:
The two web app containers stand for two web servers. They run a NodeJS sample application that implements a simple REST API. Each container exposes TCP port 3000 to receive HTTP requests.
The HAProxy load balancer is listening on TCP port 80 to receive HTTP requests from users. These requests will be forwarded to and load-balanced between the web app containers. Additionally it exposes TCP ports 1936 and 9999 for the stats page and the command-line interface.
For more details about the web application, take a look to the previous lab.
Now suppose you are working for a big e-tailer like Galaxus or Zalando. Starting with Black Friday and throughout the holiday season you see traffic to your web servers increase several times as customers are looking for and buying presents. In January the traffic drops back again to normal. You want to be able to add new servers as the traffic from customers increases and you want to be able to remove servers as the traffic goes back to normal.
Suppose further that there is an obscure bug in the web application that the developers haven't been able to understand yet. It makes the web servers crash unpredictably several times per week. When you detect that a web server has crashed you kill its container and you launch a new container.
Suppose further currently your web servers and your load balancer are
deployed like in the previous lab. What are the issues with this
architecture? Answer the following questions. The questions are
numbered from M1 to M6 to refer to them later in the lab. Please
give in your report the reference of the question you are answering.
-
[M1] Do you think we can use the current solution for a production environment? What are the main problems when deploying it in a production environment?
-
[M2] Describe what you need to do to add new
webappcontainer to the infrastructure. Give the exact steps of what you have to do without modifiying the way the things are done. Hint: You probably have to modify some configuration and script files in a Docker image. -
[M3] Based on your previous answers, you have detected some issues in the current solution. Now propose a better approach at a high level.
-
[M4] You probably noticed that the list of web application nodes is hardcoded in the load balancer configuration. How can we manage the web app nodes in a more dynamic fashion?
-
[M5] In the physical or virtual machines of a typical infrastructure we tend to have not only one main process (like the web server or the load balancer) running, but a few additional processes on the side to perform management tasks.
For example to monitor the distributed system as a whole it is common to collect in one centralized place all the logs produced by the different machines. Therefore we need a process running on each machine that will forward the logs to the central place. (We could also imagine a central tool that reaches out to each machine to gather the logs. That's a push vs. pull problem.) It is quite common to see a push mechanism used for this kind of task.
Do you think our current solution is able to run additional management processes beside the main web server / load balancer process in a container? If no, what is missing / required to reach the goal? If yes, how to proceed to run for example a log forwarding process?
-
[M6] In our current solution, although the load balancer configuration is changing dynamically, it doesn't follow dynamically the configuration of our distributed system when web servers are added or removed. If we take a closer look at the
run.shscript, we see two calls tosedwhich will replace two lines in thehaproxy.cfgconfiguration file just before we starthaproxy. You clearly see that the configuration file has two lines and the script will replace these two lines.What happens if we add more web server nodes? Do you think it is really dynamic? It's far away from being a dynamic configuration. Can you propose a solution to solve this?
In this part of the task you will set up Docker-compose with Docker containers like in the previous lab. The Docker images are a little bit different from the previous lab and we will work with these images during this lab.
You should have installed Docker-compose already in the previous lab. If not, download and install from:
Fork the following repository and then clone the fork to your machine: https://github.com/SoftEng-HEIGVD/Teaching-HEIGVD-AIT-2019-Labo-Docker
To fork the repo, just click on the Fork button in the GitHub interface.
Once you have installed everything, start the Docker compose from the project folder with the following command:
$ docker-compose up --buildThis will creates three Docker containers. One contains HAProxy, the other two contain each a sample web application.
The containers with the web application stand for two web servers that are load-balanced by HAProxy.
The provisioning of the VM and the containers will take several minutes. You should see output similar to the following:
Creating network "teaching-heigvd-ait-2019-labo-load-balancing_public_net" with driver "heig"
Building webapp1
Step 1/9 : FROM node:latest
---> d8c33ae35f44
Step 2/9 : MAINTAINER Laurent Prevost <laurent.prevost@heig-vd.ch>
---> Using cache
---> 0f0e5f2e0432
Step 3/9 : RUN apt-get update && apt-get -y install wget curl vim && apt-get clean && npm install -g bower
[...]
Creating s1 ... done
Creating s2 ... done
Creating ha ... done
You could verify that you have 3 running containers with the following command :
$ docker ps
You should see output similar to the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a37cd48f28f5 teaching-heigvd-ait-2019-labo-load-balancing_webapp2 "docker-entrypoint.s…" 2 minutes ago Up About a minute 0.0.0.0:4001->3000/tcp s2
8e3384aec724 teaching-heigvd-ait-2019-labo-load-balancing_haproxy "/docker-entrypoint.…" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp ha
da329f9d1ab6 teaching-heigvd-ait-2019-labo-load-balancing_webapp1 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:4000->3000/tcp s1
You could verify that you have a network heig who connect the containers :
$ docker network ls
You can now navigate to the address of the load balancer http://192.168.42.42 (or http://localhost) in your favorite browser. The load balancer forwards your HTTP request to one of the web app containers.
Deliverables:
-
Take a screenshot of the stats page of HAProxy at http://192.168.42.42:1936. You should see your backend nodes.
-
Give the URL of your repository URL in the lab report.
In this task, we will learn to install a process supervisor that will help us to solve the issue presented in the question M5. Installing a process supervisor gives us the ability to run multiple processes at the same time in a Docker environment.
A central tenet of the Docker design is the following principle (which for some people is a big limitation):
One process per container
This means that the designers of Docker assumed that in the normal case there is only a single process running inside a container. They designed everything around this principle. Consequently they decided that that a container is running only if there is a foreground process running. When the foreground process stops, the container automatically stops as well.
When you normally run server software like Nginx or Apache, which are designed to be run as daemons, you run a command to start them. The command is a foreground process. What happens usually is that this process then forks a background process (the daemon) and exits. Thus when you run the command in a container the process starts and right after stops and your container stops, too.
To avoid this behavior, you need to start your foreground process with an option to avoid the process to fork a daemon, but continue running in foreground. In fact, HAProxy starts by default in this "no daemon" mode.
So, the question is now, how can we run multiple processes inside one container? The answer involves using an init system. An init system is usually part of an operating system where it manages deamons and coordinates the boot process. There are many different init systems, like init.d, systemd and Upstart. Sometimes they are also called process supervisors.
In this lab, we will use a small init system called S6
http://skarnet.org/software/s6/. And more specifically, we will use
the s6-overlay scripts
https://github.com/just-containers/s6-overlay which simplify the use
of S6 in our containers. For more details about the features, see
https://github.com/just-containers/s6-overlay#features.
Is this in line with the Docker philosophy? You have a good
explanation of the s6-overlay maintainers' viewpoint here:
https://github.com/just-containers/s6-overlay#the-docker-way
The use of a process supervisor will give us the possibility to run one or more processes at a time in a Docker container. That's just what we need.
So to add it to your images, you will find TODO: [S6] Install
placeholders in the Docker images of HAProxy and
the web application
Replace the TODO: [S6] Install with the following Docker
instruction:
# Download and install S6 overlay
RUN curl -sSLo /tmp/s6.tar.gz https://github.com/just-containers/s6-overlay/releases/download/v2.1.0.2/s6-overlay-amd64.tar.gz \
&& tar xzf /tmp/s6.tar.gz -C / \
&& rm -f /tmp/s6.tar.gz
Take the opportunity to change the LABEL of the image by your
name and email. Replace in both Docker files the TODO: [GEN] Replace with your name and email.
To build your images, run the following commands VM instance:
# Build the haproxy image
cd /ha
docker build -t <imageName> .
# Build the webapp image
cd /webapp
docker build -t <imageName> .References:
Remarks:
- If you run your containers right now, you will notice that there
is no difference from the previous state of our images. That is
normal as we do not have configured anything for
S6and we do not start it in the container.
To start the containers, first you need to stop the current containers and remove them. You can do that with the following commands:
# Stop and force to remove the containers
docker rm -f s1 s2 ha
# Start the containers
docker-compose up --buildYou can check the state of your containers as we already did it in
previous task with docker ps which should produce an output similar
to the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2b277f0fe8da softengheigvd/ha "./run.sh" 21 seconds ago Up 20 seconds 0.0.0.0:80->80/tcp, 0.0.0.0:1936->1936/tcp, 0.0.0.0:9999->9999/tcp ha
0c7d8ff6562f softengheigvd/webapp "./run.sh" 22 seconds ago Up 21 seconds 3000/tcp s2
d9a4aa8da49d softengheigvd/webapp "./run.sh" 22 seconds ago Up 21 seconds 3000/tcp s1
Remarks:
- Later in this lab, the two scripts
start-containers.shandbuild-images.shwill be less relevant. During this lab, we will build and run extensively thehaproxy image. Become familiar with the dockerbuildandruncommands.
References:
We need to configure S6 as our main process and then replace the
current one. For that we will update our Docker images
HAProxy and the
web application and replace the: TODO: [S6] Replace the following instruction by the following Docker
instruction:
# This will start S6 as our main process in our container
ENTRYPOINT ["/init"]
References:
You can build and run the updated images (use the commands already provided earlier). As you can observe if you try to go to http://192.168.42.42, there is nothing live.
It's the expected behavior for now as we just replaced the application process by the process supervisor one. We have a superb process supervisor up and running but no more application.
To remedy to this situation, we will prepare the starting scripts for
S6 and copy them at the right place. Once we do this, they will be
automatically taken into account and our applications will be
available again.
Let's start by creating a folder called services in ha and webapp
folders. You can use the above commands :
mkdir -p ha/services/ha webapp/services/nodeYou should have the following folder structure:
|-- Root directory
|-- ha
|-- config
|-- scripts
|-- services
|-- ha
|-- Dockerfile
|-- webapp
|-- app
|-- services
|-- node
|-- .dockerignore
|-- Dockerfile
|-- run.sh
We need to copy the run.sh scripts as run files in the service
directories. You can achieve that by the following commands :
cp ha/scripts/run.sh ha/services/ha/run && chmod +x ha/services/ha/run
cp webapp/scripts/run.sh webapp/services/node/run && chmod +x webapp/services/node/runOnce copied, replace the hashbang instruction in both files. Replace
the first line of the run script
#!/bin/shby:
#!/usr/bin/with-contenv bashThis will instruct S6 to give the environment variables from the
container to the run script.
The start scripts are ready but now we must copy them to the right
place in the Docker image. In both ha and webapp Docker files, you
need to add a COPY instruction to setup the service correctly.
In ha Docker file, you need to replace: TODO: [S6] Replace the two following instructions by
# Copy the S6 service and make the run script executable
COPY services/ha /etc/services.d/ha
RUN chmod +x /etc/services.d/ha/run
Do the same in the webappDocker file with the following replacement:
TODO: [S6] Replace the two following instructions by
# Copy the S6 service and make the run script executable
COPY services/node /etc/services.d/node
RUN chmod +x /etc/services.d/node/run
References:
Remarks:
- We can discuss if is is really necessary to do
RUN chmod +x ...in the image creation as we already created therunfiles with+xrights. Doing so make sure that we will never have issue with copy/paste of the file or transferring between unix world and windows world.
Build again your images and run them. If everything is working fine, you should be able to open http://192.168.42.42 and see the same content as in the previous task.
Deliverables:
-
Take a screenshot of the stats page of HAProxy at http://192.168.42.42:1936. You should see your backend nodes. It should be really similar to the screenshot of the previous task.
-
Describe your difficulties for this task and your understanding of what is happening during this task. Explain in your own words why are we installing a process supervisor. Do not hesitate to do more research and to find more articles on that topic to illustrate the problem.
Installing a cluster membership management tool will help us to solve the problem we detected in M4. In fact, we will start to use what we put in place with the solution to issue M5. We will build two images with our process supervisor running the cluster membership management tool
Serf.
In this task, we will focus on how to make our infrastructure more flexible so that we can dynamically add and remove web servers. To achieve this goal, we will use a tool that allows each node to know which other nodes exist at any given time.
We will use Serf for this. You can read more about this tool at
https://www.serf.io/.
The idea is that each container will have a Serf agent running on it, the webapp containers and the load balancer container. The Serf agents talk to each other using a decentralized peer-to-peer protocol to exchange information. They form a cluster of nodes. The main information they exchange is the existence of nodes in the cluster and what their IP addresses are. When a node appears or disappears the Serf agents tell each other about the event. When the information arrives at the load balancer we will be able to react accordingly. A Serf agents can trigger the execution of local scripts when it receives an event.
So in summary, in our infrastructure, we want the following:
-
Start our load balancer (HAProxy) and let it stay alive forever (or at least for the longest uptime as possible).
-
Start one or more backend nodes at any time after the load balancer has been started.
-
Make sure the load balancer knows about the nodes that appear and the nodes that disappear. We want to be able to react when a new web server becomes online or disappears and reconfigure the load balancer based on the current state of our web server topology.
In theory this seems quite clear and easy but to achieve everything,
there remain a few steps to be done before we are ready. So we will
start in this task by installing Serf and see how it is working with
simple events and triggers, without changing yet the load balancer
configuration. The tasks 3 to 6 will deal the latter part.
To install Serf we have to add the following Docker instruction in
the ha and webapp Docker files. Replace the line TODO: [Serf] Install in ha/Dockerfile and
webapp/Dockerfile with the following
instruction:
# Install serf (for decentralized cluster membership: https://www.serf.io/)
RUN mkdir /opt/bin \
&& curl -sSLo /tmp/serf.gz https://releases.hashicorp.com/serf/0.8.2/serf_0.8.2_linux_amd64.zip \
&& gunzip -c /tmp/serf.gz > /opt/bin/serf \
&& chmod 755 /opt/bin/serf \
&& rm -f /tmp/serf.gz
You can build your images as we did in the previous task. As expected,
nothing new is happening when we run our updated images. Serf will
not start before we add the proper service into S6. The next steps
will allow us to have the following containers:
HAProxy container
S6 process
-> HAProxy process
-> Serf process
WebApp containers
S6 process
-> NodeJS process
-> Serf process
Each container will run a S6 main process with, at least, two
processes that are our application processes and Serf processes.
To start Serf, we need to create the proper service for S6. Let's
do that with the creation of the service folder in ha/services and
webapp/services. Use the following command to do that.
mkdir ./ha/services/serf ./webapp/services/serfYou should have the following folders structure:
|-- Root directory
|-- ha
|-- config
|-- scripts
|-- services
|-- ha
|-- serf
|-- Dockerfile
|-- webapp
|-- app
|-- services
|-- node
|-- serf
|-- .dockerignore
|-- Dockerfile
|-- run.sh
In each directory, create an executable file called run. You can
achieve that by the following commands:
touch ha/services/serf/run && chmod +x ha/services/serf/run
touch webapp/services/serf/run && chmod +x webapp/services/serf/runIn the ha/services/serf/run file, add the following script. This
will start and enable the capabilities of Serf on the load
balancer. You can ignore the tricky part of the script about process
management. You can look at the comments and ask us for more info if
you are interested.
The principal part between SERF START and SERF END is the command
we prepare to run the serf agent.
#!/usr/bin/with-contenv bash
# ##############################################################################
# WARNING
# ##############################################################################
# S6 expects the processes it manages to stop when it sends them a SIGTERM signal.
# The Serf agent does not stop properly when receiving a SIGTERM signal.
#
# Therefore, we need to do some tricks to remedy the situation. We need to
# "simulate" the handling of SIGTERM in the script and send to Serf the signal
# that makes it quit (SIGINT).
#
# Basically we need to do the following:
# 1. Keep track of the process id (PID) of Serf Agent
# 2. Catch the SIGTERM from S6 and send a SIGINT to Serf
# 3. Make sure this shell script will not stop before S6 stops it, but when
# SIGTERM is sent, we need to stop everything.
# Get the current process ID to avoid killing an unwanted process
pid=$$
# Define a function to kill the Serf process as Serf does not accept SIGTERM. In
# place, we will send a SIGINT signal to the process to stop it correctly.
sigterm() {
kill -INT $pid
}
# Trap the SIGTERM and in place run the function that will kill the process
trap sigterm SIGTERM
# ##############################################################################
# SERF START
# ##############################################################################
# We build the Serf command to run the agent
COMMAND="/opt/bin/serf agent"
COMMAND="$COMMAND --join ha"
COMMAND="$COMMAND --replay"
COMMAND="$COMMAND --event-handler member-join=/serf-handlers/member-join.sh"
COMMAND="$COMMAND --event-handler member-leave,member-failed=/serf-handlers/member-leave.sh"
COMMAND="$COMMAND --tag role=$ROLE"
# ##############################################################################
# SERF END
# ##############################################################################
# Log the command
echo "$COMMAND"
# Execute the command in the background
exec $COMMAND &
# Retrieve the process ID of the command run in background. Doing that, we will
# be able to send the SIGINT signal through the sigterm function we defined
# to replace the SIGTERM.
pid=$!
# Wait forever to simulate a foreground process for S6. This will act as our
# blocking process that S6 is expecting.
waitLet's take the time to analyze the Serf agent command. We launch the
Serf agent with the command:
serf agentNext, we append to the command the way to join a specific Serf cluster where the
address of the cluster is ha. In fact, our ha node will act as a sort of master
node but as we are in a decentralized architecture, it can be any of the nodes with
a Serf agent.
For example, if we start ha first, then s2 and finally s1, we can imagine
that ha will connect to itself as it is the first one. Then, s2 will
reference to ha to be in the same cluster and finally s1 can reference s2.
Therefore, s1 will join the same cluster than s2 and ha but through s2.
For simplicity, all our nodes will register to the same cluster trough the ha
node.
--join haRemarks:
-
Once the cluster is created in
Serfagent, the first node which created theSerfcluster can leave the cluster. In fact, leaving the cluster will not stop it as long as theSerfagent is running.Anyway, in our current solution, there is kind of misconception around the way we create the
Serfcluster. In the deliverables, describe which problem exists with the current solution based on the previous explanations and remarks. Propose a solution to solve the issue.
To make sure that ha load balancer can leave and enter the cluster again, we add
the --replay option. This will make the Serf agent replay the past events and then react to
these events. In fact, due to the problem you have to guess, this will probably not
be really useful.
--replayThen we append the event handlers to react to some events.
--event-handler member-join=/serf-handlers/member-join.sh
--event-handler member-leave,member-failed=/serf-handlers/member-leave.shAt the moment the member-join and member-leave scripts are missing. We will add
them in a moment. These two scripts will manage the load balancer configuration.
And finally, we set a tag role=<rolename> to our load balancer. The $ROLE is
the environment variable that we have in the Docker files. With the role, we will
be able to differentiate between the balancer and the backend nodes.
--tag role=$ROLEIn fact, each node that will join or leave the Serf cluster will trigger a join,
respectively leave events. It means that the handler scripts on the ha node
will be called for all the nodes, including itself. We want to avoid reconfiguring
ha proxy when itself joins or leaves the Serf cluster.
References:
Let's prepare the same kind of configuration. Copy the run file you just created
in webapp/services/serf and replace the content between SERF START and SERF END
by the following one:
# We build the Serf command to run the agent
COMMAND="/opt/bin/serf agent"
COMMAND="$COMMAND --join ha"
COMMAND="$COMMAND --tag role=$ROLE"This time, we do not need to have event handlers for the backend nodes. The
backend nodes will just appear and disappear at some point in time and
nothing else. The $ROLE is also replaced by the -e "ROLE=backend" from
the Docker run command.
Again, we need to update our Docker images to add the Serf service to S6.
In both Docker image files, in the ha and webapp folders,
replace TODO: [Serf] Add Serf S6 setup with the instruction to copy the
Serf agent run script and to make it executable.
And finally, you can expose the Serf ports through your Docker image files. Replace
the TODO: [Serf] Expose ports by the following content:
# Expose the ports for Serf
EXPOSE 7946 7373
References:
It's time to build the images and to run the containers. You can use the provided scripts
or run the command manually. At this stage, you should have your application running as the
Serf agents. To ensure that, you can access http://192.168.42.42 to see if you backends
are responding and you can check the Docker logs to see what is happening. Simply run:
docker logs <container name>where container name is one of:
- ha
- s1
- s2
Remarks:
-
When we reach this point, we have a problem. If we start the HAProxy first, it will not start as the two
s1ands2containers are not started and we try to link them through the Dockerruncommand.You can try and get the logs. You will see error logs where
s1ands2If we start
s1ands2nodes beforeha, we will have an error fromSerf. They try to connect theSerfcluster viahacontainer which is not running.So the reverse proxy is not working but what we can do at least is to start the containers beginning by
haand then backend nodes. It will make theSerfpart working and that's what we are working on at the moment and in the next task.
References:
- docker network create
- Understand Docker networking
- Embedded DNS server in user-defined networks
- docker run
Cleanup:
-
As we have changed the way we start our reverse proxy and web application, we can remove the original
run.shscripts. You can use the following commands to clean these two files (and folder in case of web application).rm ha/scripts/run.sh rm -r webapp/scripts
Deliverables:
-
Provide the docker log output for each of the containers:
ha,s1ands2. You need to create a folderlogsin your repository to store the files separately from the lab report. For each lab task create a folder and name it using the task number. No need to create a folder when there are no logs.Example:
|-- root folder |-- logs |-- task 1 |-- task 3 |-- ... -
Give the answer to the question about the existing problem with the current solution.
-
Give an explanation on how
Serfis working. Read the official website to get more details about theGOSSIPprotocol used inSerf. Try to find other solutions that can be used to solve similar situations where we need some auto-discovery mechanism.
Serf is really simple to use as it lets the user write their own shell scripts to react to the cluster events. In this task we will write the first bits and pieces of the handler scripts we need to build our solution. We will start by just logging members that join the cluster and the members that leave the cluster. We are preparing to solve concretely the issue discovered in M4.
We reached a state where we have nearly all the pieces in place to make the infrastructure
really dynamic. At the moment, we are missing the scripts that will react to the events
reported by Serf, namely member leave or member join.
We will start by creating the scripts in ha/scripts. So create two files in this directory and set them as executable. You can use these commands:
touch ha/scripts/member-join.sh && chmod +x ha/scripts/member-join.sh
touch ha/scripts/member-leave.sh && chmod +x ha/scripts/member-leave.shIn the member-join.sh script, put the following content:
#!/usr/bin/env bash
echo "Member join script triggered" >> /var/log/serf.log
# We iterate over stdin
while read -a values; do
# We extract the hostname, the ip, the role of each line and the tags
HOSTNAME=${values[0]}
HOSTIP=${values[1]}
HOSTROLE=${values[2]}
HOSTTAGS=${values[3]}
echo "Member join event received from: $HOSTNAME with role $HOSTROLE" >> /var/log/serf.log
doneDo the same for the member-leave.sh with the following content:
#!/usr/bin/env bash
echo "Member leave/join script triggered" >> /var/log/serf.log
# We iterate over stdin
while read -a values; do
# We extract the hostname, the ip, the role of each line and the tags
HOSTNAME=${values[0]}
HOSTIP=${values[1]}
HOSTROLE=${values[2]}
HOSTTAGS=${values[3]}
echo "Member $SERF_EVENT event received from: $HOSTNAME with role $HOSTROLE" >> /var/log/serf.log
doneWe have to update our Docker file for ha node. Replace the
TODO: [Serf] Copy events handler scripts with appropriate content to:
- Make sure there is a directory
/serf-handlers. - The
member-joinandmember-leavescripts are placed in this folder. - Both of the scripts are executable.
Stop all your containers to have a fresh state:
docker rm -f ha s1 s2Now, build your ha image:
# Build the haproxy image
cd /ha
docker build -t <imageName> .From now on, we will ask you to systematically keep the logs and copy them into your repository as a lab deliverable. Whenever you see the notice (keep logs) after a command, copy the logs into the repository.
Run the ha container first and capture the logs with docker logs (keep the logs).
docker-compose up -d haproxyNow, run one of the two backend containers and capture the logs (keep the logs). Shortly after
starting the container capture also the logs of the ha node (keep the logs).
docker-compose up -d webapp1
docker-compose up -d webapp2Once started, get the logs (keep the logs) of the backend container.
To check there is something happening on the node ha you will need to connect
to the running container to gather the custom log file that is created in the
handler scripts. For that, use the following command to connect to ha
container in interactive mode.
docker exec -ti ha /bin/bashReferences:
Once done, you can simply run the following command. This command is run inside
the running ha container. (keep the logs)
cat /var/log/serf.logOnce you have finished, you have simply to type exit in the container to quit
your shell session and at the same time the container. The container itself will
continue to run.
Deliverables:
-
Provide the docker log output for each of the containers:
ha,s1ands2. Put your logs in thelogsdirectory you created in the previous task. -
Provide the logs from the
hacontainer gathered directly from the/var/log/serf.logfile present in the container. Put the logs in thelogsdirectory in your repo.
We have to generate a new configuration file for the load balancer each time a web server is added or removed. There are several ways to do this. Here we choose to go the way of templates. In this task we will put in place a template engine and use it with a basic example. You will not become an expert in template engines but it will give you a taste of how to apply this technique which is often used in other contexts (like web templates, mail templates, ...). We will be able to solve the issue raised in M6.
There are several ways to generate a configuration file from variables
in a dynamic fashion. In this lab we decided to use NodeJS and
Handlebars for the template engine.
According to Wikipedia:
A template engine is a software designed to combine one or more templates with a data model to produce one or more result documents
In our case our template is the HAProxy configuration file in which
we put placeholders written in the template language. Our data model
is the data provided by the handler scripts of Serf. And the
resulting document coming out of the template engine is a
configuration file that HA proxy can understand where the placeholders
have been replaced with the data.
References:
To be able to use Handlebars as a template engine in our ha
container, we need to install NodeJS and Handlebars.
To install NodeJS, just replace TODO: [HB] Install NodeJS by the
following content:
# Install NodeJS
RUN curl -sSLo /tmp/node.tar.xz https://nodejs.org/dist/v14.15.1/node-v14.15.1-linux-x64.tar.xz \
&& tar -C /usr/local --strip-components 1 -xf /tmp/node.tar.xz \
&& rm -f /tmp/node.tar.xz
We also need to update the base tools installed in the image to be
able to extract the NodeJS archive. So we need to add xz-utils to
the apt-get install present above the line TODO: [HB] Update to install required tool to install NodeJS.
Remarks:
-
You probably noticed that we have the webapp image with a
NodeJSapplication. So the image already containsNodeJS. We have based our backend image on an existing image that provides an installation ofNodeJS. In ourhaimage, we take a shortcut and do a manual installation ofNodeJS.This manual install has at least one bad practice: In the original image of
NodeJSthey download of the required files and then check the downloads against aGPGsignatures. We have skipped this part in ourhaimage, but in practice you should check every download to avoid issues like theman in the middleattack.You can take a look at the following links if you are interested in this topic:
The other reason why we have to manually install
NodeJSis that we cannot inherit from two images at the same time. As in ourhaimage we already inheritFROMthehaproxyofficial image we cannot use theNodeJSimage at the same time.In fact, the
FROMinstruction from Docker works like the Java inheritance model. You can inherit only from one super class at a time. For example, we have the following hierarchy for our HAProxy image.
Here is the reference to the Docker documentation of the
FROMcommand:
It's time to install Handlebars and a small command line tool
handlebars-cmd to make it work properly. For that replace the TODO: [HB] Install Handlebars and cli by this Docker instruction:
# Install the handlebars-cmd node module and its dependencies
RUN npm install -g handlebars-cmd
Remarks:
- NPM is a package manager for
NodeJS. Like other package managers, one of its tasks is to manage the dependencies of a package. That's the reason why we have to install onlyhandlebars-cmd. This package has thehandlebarspackage as one of its dependencies.
Now we will update the handler scripts to use Handlebars. For the moment, we
will just play with a simple template. So, first create a file in ha/config called
haproxy.cfg.hb with a simple template content. Use the following command for that:
echo "Container {{ name }} has joined the Serf cluster with the following IP address: {{ ip }}" >> ha/config/haproxy.cfg.hbWe need our template present in our ha image. We have to add the following
Docker instructions for that. Let's replace TODO: [HB] Copy the haproxy configuration template
in ha/Dockerfile with the required stuff to:
- Have a directory
/config - Have the
haproxy.cfg.hbin it
Then, update the member-join.sh script in ha/scripts with the following content:
#!/usr/bin/env bash
echo "Member join script triggered" >> /var/log/serf.log
# We iterate over stdin
while read -a values; do
# We extract the hostname, the ip, the role of each line and the tags
HOSTNAME=${values[0]}
HOSTIP=${values[1]}
HOSTROLE=${values[2]}
HOSTTAGS=${values[3]}
echo "Member join event received from: $HOSTNAME with role $HOSTROLE" >> /var/log/serf.log
# Generate the output file based on the template with the parameters as input for placeholders
handlebars --name $HOSTNAME --ip $HOSTIP < /config/haproxy.cfg.hb > /tmp/haproxy.cfg
done
Time to build our ha image and run it. We will also run s1 and s2. As usual, here
are the commands to build and run our image and containers:
# Remove running containers
docker rm -f ha s1 s2
# Build the haproxy image
cd ha
docker build -t <imageName> .
# Run the HAProxy container
docker run -d -p 80:80 -p 1936:1936 -p 9999:9999 --network heig --name ha <imageName>
# OR
docker-compose up --buildRemarks:
- Installing a new util with
apt-getmeans building the whole image again as it is in our Docker file. This will take few minutes.
Take the time to retrieve the output file in the ha container. Connect to the container:
docker exec -ti ha /bin/bashand get the content from the file (keep it for deliverables, handle it as you do for the logs)
cat /tmp/haproxy.cfgAfter you have inspected the generated file quit the container with exit.
Now that we invoke the template engine from the handler script it is
time to do an end-to-end test. Start the s1 container, wait a bit,
then retrieve the haproxy.cfg file from the ha container to see
whether it saw s1 coming up. Then do the same for s2:
# 1) Run the S1 container
docker-compose up -d webapp1
# 2) Connect to the ha container (optional if you have another ssh session)
docker exec -ti ha /bin/bash
# 3) From the container, extract the content (keep it for deliverables)
cat /tmp/haproxy.cfg
# 4) Quit the ha container (optional if you have another ssh session)
exit
# 5) Run the S2 container
docker-compose up -d webapp2
# 6) Connect to the ha container (optional if you have another ssh session)
docker exec -ti ha /bin/bash
# 7) From the container, extract the content (keep it for deliverables)
cat /tmp/haproxy.cfg
# 8) Quit the ha container
exitDeliverables:
- You probably noticed when we added
xz-utils, we have to rebuild the whole image which took some time. What can we do to mitigate that? Take a look at the Docker documentation on image layers. Tell us about the pros and cons to merge as much as possible of the command. In other words, compare:
RUN command 1
RUN command 2
RUN command 3
vs.
RUN command 1 && command 2 && command 3
There are also some articles about techniques to reduce the image
size. Try to find them. They are talking about squashing or
flattening images.
-
Propose a different approach to architecture our images to be able to reuse as much as possible what we have done. Your proposition should also try to avoid as much as possible repetitions between your images.
-
Provide the
/tmp/haproxy.cfgfile generated in thehacontainer after each step. Place the output into thelogsfolder like you already did for the Docker logs in the previous tasks. Three files are expected.In addition, provide a log file containing the output of the
docker psconsole and another file (per container) withdocker inspect <container>. Four files are expected. -
Based on the three output files you have collected, what can you say about the way we generate it? What is the problem if any?
We now have S6 and Serf ready in our HAProxy image. We have member join/leave handler scripts and we have the handlebars template engine. So we have all the pieces ready to generate the HAProxy configuration dynamically. We will update our handler scripts to manage the list of nodes and to generate the HAProxy configuration each time the cluster has a member leave/join event. The work in this task will let us solve the problem mentioned in M4.
At this stage, we have:
-
Two images with
S6process supervisor that starts a Serf agent and an "application" (HAProxy or Node web app). -
The
haimage contains the required stuff to react toSerfevents when a container joins or leaves theSerfcluster. -
A template engine in the
haimage is ready to be used to generate the HAProxy configuration file.
Now, we need to refine our join and leave scripts to generate a
proper HAProxy configuration file.
First, we will copy/paste the content of the ha/config/haproxy.cfg file into the template ha/config/haproxy.cfg.hb. You can simply run the following command:
cp ha/config/haproxy.cfg ha/config/haproxy.cfg.hbThen we will replace the content between # HANDLEBARS START and
# HANDLEBARS STOP (see previous haproxy.cfg.hb) by the following content:
{{#each addresses}}
server {{ host }} {{ ip }}:3000 check
{{/each}}
Remarks:
-
eachiterates over a collection of data -
{{and}}are the bars that will be interpreted byhandlebars -
hostandipare the data contained in the JSON format of the collection that handlebars will receive. We will see that right after in themember-join.shscript. The JSON format will be:{ "host": "<hostname>", "ip": "<ip address>" }.
Our configuration template is ready. Let's update the member-join.sh script to
generate the correct configuration.
The mechanism to manage the join and leave events is the following:
-
We check if the event comes from a backend node (the role is used).
-
We create a file with the hostname and IP address of each backend node that joins the cluster.
-
We build the
handlebarscommand to generate the new configuration from the list of files that represent our backend nodes
The same logic also applies when a node leaves the cluster. In this case, the second step will remove the file with the node data.
In the file ha/scripts/member-join.sh replace the whole content by the following one. Take the time to read the comments.
#!/usr/bin/env bash
echo "Member join script triggered" >> /var/log/serf.log
BACKEND_REGISTERED=false
# We iterate over stdin
while read -a values; do
# We extract the hostname, the ip, the role of each line and the tags
HOSTNAME=${values[0]}
HOSTIP=${values[1]}
HOSTROLE=${values[2]}
HOSTTAGS=${values[3]}
# We only register the backend nodes
if [[ "$HOSTROLE" == "backend" ]]; then
echo "Member join event received from: $HOSTNAME with role $HOSTROLE" >> /var/log/serf.log
# We simply register the backend IP and hostname in a file in /nodes
# with the hostname for the file name
echo "$HOSTNAME $HOSTIP" > /nodes/$HOSTNAME
# We have at least one new node registered
BACKEND_REGISTERED=true
fi
done
# We only update the HAProxy configuration if we have at least one new backend node
if [[ "$BACKEND_REGISTERED" = true ]]; then
# To build the collection of nodes
HOSTS=""
# We iterate over each backend node registered
for hostfile in $(ls /nodes); do
# We convert the content of the backend node file to a JSON format: { "host": "<hostname>", "ip": "<ip address>" }
CURRENT_HOST=`cat /nodes/$hostfile | awk '{ print "{\"host\":\"" $1 "\",\"ip\":\"" $2 "\"}" }'`
# We concatenate each host
HOSTS="$HOSTS$CURRENT_HOST,"
done
# We process the template with handlebars. The sed command will simply remove the
# trailing comma from the hosts list.
handlebars --addresses "[$(echo $HOSTS | sed s/,$//)]" < /config/haproxy.cfg.hb > /usr/local/etc/haproxy/haproxy.cfg
# TODO: [CFG] Add the command to restart HAProxy
fiAnd here we go for the member-leave.sh script. The script differs only for the part where
we remove the backend nodes registered via the member-join.sh.
#!/usr/bin/env bash
echo "Member leave/join script triggered" >> /var/log/serf.log
BACKEND_UNREGISTERED=false
# We iterate over stdin
while read -a values; do
# We extract the hostname, the ip, the role of each line and the tags
HOSTNAME=${values[0]}
HOSTIP=${values[1]}
HOSTROLE=${values[2]}
HOSTTAGS=${values[3]}
# We only remove the backend nodes
if [[ "$HOSTROLE" == "backend" ]]; then
echo "Member $SERF_EVENT event received from: $HOSTNAME with role $HOSTROLE" >> /var/log/serf.log
# We simply remove the file that was used to track the registered node
rm /nodes/$HOSTNAME
# We have at least one new node that leave the cluster
BACKEND_UNREGISTERED=true
fi
done
# We only update the HAProxy configuration if we have at least a backend that
# left the cluster. The process to generate the HAProxy configuration is the
# same than for the member-join script.
if [[ "$BACKEND_UNREGISTERED" = true ]]; then
# To build the collection of nodes
HOSTS=""
# We iterate over each backend node registered
for hostfile in $(ls /nodes); do
# We convert the content of the backend node file to a JSON format: { "host": "<hostname>", "ip": "<ip address>" }
CURRENT_HOST=`cat /nodes/$hostfile | awk '{ print "{\"host\":\"" $1 "\",\"ip\":\"" $2 "\"}" }'`
# We concatenate each host
HOSTS="$HOSTS$CURRENT_HOST,"
done
# We process the template with handlebars. The sed command will simply remove the
# trailing comma from the hosts list.
handlebars --addresses "[$(echo $HOSTS | sed s/,$//)]" < /config/haproxy.cfg.hb > /usr/local/etc/haproxy/haproxy.cfg
# TODO: [CFG] Add the command to restart HAProxy
fiRemarks:
- The way we keep track the backend nodes is pretty simple and makes
the assumption there is no concurrency issue with
Serf. That's reasonable enough to get a quite simple solution.
Cleanup:
-
In the main configuration file that is used for bootstrapping HAProxy the first time when there are no backend nodes, we have the list of servers that we used in the first task and the previous lab. We can remove the list. So find
TODO: [CFG] Remove all the serversand remove the list of nodes. -
In ha/services/ha/run, we can remove the two lines above
TODO: [CFG] Remove the following two lines.
We need to make sure the image has the folder /nodes created. In the
Docker file, replace the TODO: [CFG] Create the nodes folder by the
correct instruction to create the /nodes folder.
We are ready to build and test our ha image. Let's proceed like in
the previous task. You should provide the same outputs for
the deliverables. Remember that we have moved the file
/tmp/haproxy.cfg to /usr/local/etc/haproxy/haproxy.cfg (keep
track of the config file like in previous step and also the output of
docker ps and docker inspect <container>).
You can also get the list of registered nodes from inside the ha
container. Simply list the files from the directory /nodes. (keep
track of the output of the command like the logs in previous tasks)
Now, use the Docker commands to stop s1.
You can connect again to the ha container and get the haproxy
configuration file and also the list of backend nodes. Use the
previous command to reach this goal. (keep track of the output of
the ls command and the configuration file like the logs in previous
tasks)
Deliverables:
-
Provide the file
/usr/local/etc/haproxy/haproxy.cfggenerated in thehacontainer after each step. Three files are expected.In addition, provide a log file containing the output of the
docker psconsole and another file (per container) withdocker inspect <container>. Four files are expected. -
Provide the list of files from the
/nodesfolder inside thehacontainer. One file expected with the command output. -
Provide the configuration file after you stopped one container and the list of nodes present in the
/nodesfolder. One file expected with the command output. Two files are expected.In addition, provide a log file containing the output of the
docker psconsole. One file expected. -
(Optional:) Propose a different approach to manage the list of backend nodes. You do not need to implement it. You can also propose your own tools or the ones you discovered online. In that case, do not forget to cite your references.
Finally, we have all the pieces in place to finish our solution. HAProxy will be reconfigured automatically when web app nodes are leaving/joining the cluster. We will solve the problems you have discussed in M1 - 3. Again, the solution built in this lab is only one example of tools and techniques we can use to solve this kind of situation. There are several other ways.
The only thing missing now is to make sure the configuration of HAProxy is up-to-date and taken into account by HAProxy.
We will try to make HAProxy reload his config with minimal
downtime. At the moment, we will replace the line TODO: [CFG] Replace this command in ha/services/ha/run by the
following script part. As usual, take the time to read the comments.
#!/usr/bin/with-contenv bash
# ##############################################################################
# WARNING
# ##############################################################################
# S6 expects the processes it manages to stop when it sends them a SIGTERM signal.
# The Serf agent does not stop properly when receiving a SIGTERM signal.
#
# Therefore, we need to do some tricks to remedy the situation. We need to
# "simulate" the handling of SIGTERM in the script and send to Serf the signal
# that makes it quit (SIGINT).
#
# Basically we need to do the following:
# 1. Keep track of the process id (PID) of Serf Agent
# 2. Catch the SIGTERM from S6 and send a SIGINT to Serf
# 3. Make sure this shell script will not stop before S6 stops it, but when
# SIGTERM is sent, we need to stop everything.
# Get the current process ID to avoid killing an unwanted process
pid=$$
# Define a function to kill the Serf process as Serf does not accept SIGTERM. In
# place, we will send a SIGINT signal to the process to stop it correctly.
sigterm() {
kill -USR1 $pid
}
# Trap the SIGTERM and in place run the function that will kill the process
trap sigterm SIGTERM
# We need to keep track of the PID of HAProxy in a file for the restart process.
# We are forced to do that because the blocking process for S6 is this shell
# script. When we send to S6 a command to restart our process, we will lose
# the value of the variable pid. The pid variable will stay alive until any
# restart or stop from S6.
#
# In the case of a restart we need to keep the HAProxy PID to give it back to
# HAProxy. The comments on the HAProxy command will complete this exaplanation.
if [ -f /var/run/haproxy.pid ]; then
HANDOFFPID=`cat /var/run/haproxy.pid`
fi
# To kill an old HAProxy and start a new one with minimal outage
# HAProxy provides the -sf/-st command-line options. With these options
# one can give the PIDs of currently running HAProxy processes at startup.
# This will start new HAProxy processes and when startup is complete
# it send FINISH or TERMINATE signals to the ones given in the argument.
#
# The HANDOFFPID keeps track of the PID of HAProxy. We retrieve it from the
# the file we written the last time we (re)started HAProxy.
exec haproxy -f /usr/local/etc/haproxy/haproxy.cfg -sf $HANDOFFPID &
# Retrieve the process ID of the command run in background. Doing that, we will
# be able to send the SIGINT signal through the sigterm function we defined
# to replace the SIGTERM.
pid=$!
# And write it to a file to get it on next restart
echo $pid > /var/run/haproxy.pid
# Finally, we wait as S6 launches this shell script. This will simulate
# a foreground process for S6. All that tricky stuff is required because
# we use a process supervisor in a Docker environment. The applications need
# to be adapted for such environments.
waitRemarks:
- In this lab, we do not achieve an HAProxy restart with zero downtime. You will find an article about that in the references.
References:
We need to update our member-join and member-leave scripts to make sure HAProxy
will be restarted when its configuration is modified. For that, in both files, replace
TODO: [CFG] Add the command to restart HAProxy by the following command.
# Send a SIGHUP to the process. It will restart HAProxy
s6-svc -h /var/run/s6/services/haReferences:
It's time to build and run our images. At this stage, if you try to reach
http://192.168.42.42 or http://localhost, it will not work. No surprise as we do not start any
backend node. Let's start one container and try to reach the same URL.
You can start the web application nodes. If everything works well, you could reach your backend application through the load balancer.
And now you can start and stop any number of nodes you want! You will see the dynamic reconfiguration occurring. Keep in mind that HAProxy will take few seconds before nodes will be available. The reason is that HAProxy is not so quick to restart inside the container and your web application is also taking time to bootstrap. And finally, depending of the health checks of HAProxy, your web app will not be available instantly.
Finally, we achieved our goal to build an architecture that is dynamic and reacts to nodes coming and going!
Deliverables:
-
Take a screenshots of the HAProxy stat page showing more than 2 web applications running. Additional screenshots are welcome to see a sequence of experimentations like shutting down a node and starting more nodes.
Also provide the output of
docker psin a log file. At least one file is expected. You can provide one output per step of your experimentation according to your screenshots. -
Give your own feelings about the final solution. Propose improvements or ways to do the things differently. If any, provide references to your readings for the improvements.
-
(Optional:) Present a live demo where you add and remove a backend container.
It appears that Windows users can encounter a CRLF vs. LF problem when the repos is cloned without taking care of the ending lines. Therefore, if the ending lines are CRFL, it will produce an error message with Docker:
... no such file or directory(Take a look to this Docker issue: moby/moby#9066, the last post show the error message).
The error message is not really relevant and difficult to troubleshoot. It seems the problem is caused by the line endings not correctly interpreted by Linux when they are CRLF in place of LF. The problem is caused by cloning the repos on Windows with a system that will not keep the LF in the files.
Fortunatelly, there is a procedure to fix the CRLF to LF and then be sure Docker will recognize the *.sh files.
First, you need to add the file .gitattributes file with the following content:
* text eol=lfThis will ask the repos to force the ending lines to LF for every text files.
Then, you need to reset your repository. Be sure you do not have modified files.
# Erease all the files in your local repository
git rm --cached -r .
# Restore the files from your local repository and apply the correct ending lines (LF)
git reset --hardThen, you are ready to go.
There is a link to deeper explanation and procedure about the ending lines written by GitHub: https://help.github.com/articles/dealing-with-line-endings/

