There are many different tutorials about Docker and a huge amount of detailed documentation, but you might need to go through tons of them to understand why docker is needed, and how to use it. Even after that, you probably don’t get a whole picture. In this part, I aggregate all the key information required to understand what is docker, why you need it, which role it plays in development and deployment, and how it can simplify a lot of things for you.
What is docker?
What if you need to install an operating system and a list of software to multiple identical personal computers and pre-configure it? The easiest way will be:
- Install an operating system to one personal computer;
- Install all the required software and pre-configure it on this computer;
- Create an image of the hard drive of this computer, and copy it to the hard drives of all other personal computers. Done… You can save the image for the next set of computers which need the same OS and Software.
What if your next set of computers need the same operating system but different software? You can save the image of the hard drive with the pure operating system (after step 1 above) and reuse it to save some time.
The same principle is used in the world of virtual servers. You can install an operating system using x86 hypervisor software (VirtualBox, VMware, etc), save the files with the virtual machine, and run a copy of it many times. With cloud solutions, like Amazon Web Services, Google Cloud, Azure, you run a copy of an operating system from a list of images which already installed and pre-configured for you to run on virtual machines. When you run such a virtual server for the first time in a terminal or a web interface, a selected image is copied for you and launched using a hypervisor software.
Let’s get back to our main question, “what is docker?”. Docker is a virtualization container designed for launching and running a single application, or a service. And, it works on the same principles as I just described for physical and virtual servers, but with a goal to run a single application, not a complete operating system. To create a docker image of any application, including your own, you follow similar steps as above for a physical machine:
- Take an image of the existing operating system (existing docker image);
- Install the application & pre-configure it in this operating system;
- The image with the application is ready. Now, you can run the resulted image, it means your application, on a machine where you built the image using a docker, or upload the image to a private or public image registry and run it anywhere you want.
To build an image with steps 1–2, you need to create a Dockerfile with. special syntax and run a command by docker command line interface or use docker SDK. You can check how a Dockerfile looks like for nginx, mysql, or find another one in the public docker registry.
To upload an image (step 3) use docker CLI or docker API. Docker registry, to which you upload the images, is similar to GitHub with the purpose to store docker images. There’s an official public docker registry, where you can find a lot of existing images ready to use, or where you can store your own public and private images. Each cloud service provider has its own registry where you can store your images, like AWS ECR or Google Container Registry, or you can deploy your own registry. When you run an image, and it doesn’t exist on your host yet, the docker downloads the image from the registry automatically and persists it in the local registry. You can run an image not only on your host but also on worker nodes, using a certain orchestration layer (we’ll get to it later).
What is the difference between using a docker to containerize an application in comparison with a typical x86 hypervisor software? Docker doesn’t run a real operating system from step 1, but the containerized app consumes the libraries, etc from the operating system used to create an image. Docker is optimized to run the app as fast as it runs on the same host without a docker and it consumes the same memory. The overhead is minimal.
What are the benefits of using a docker to containerize an application?
- No need to install and configure an application over and over again, just build an image for your application and run it instead; do it for your applications and for public ones (mysql, mongo, swagger, etc);
- Run any version of the containerized application on a regular basis or to experiment;
- Isolated environment of the running application, it doesn’t know about the host system; the running application takes all the resources only from the image;
- Easy to run and clean up, easy to deploy, easy to revert;
- Limit/Specify CPU and memory resources of each container;
- Scale Vertically or Horizontally even on the same host (the base principles are below with an example);
- Easy to combine multiple applications (web server, API servers, DBs, caches, etc) into a single system (below with an example);
- A basis for Kubernetes and other orchestration layers.
Running a docker container
To run a docker container from a docker image, you might need to pass a configuration to it, including:
- IP port to expose to the public; by default, if an application listens on a port, it is not accessible, you need to expose it, and you can change the exposed public port, for example, if a container listens on 80, you can expose it to public as 8080;
- volumes or just configuration files, for example, html content or/and public/private keys for nginx, or database folder for mysql or mongo to persist it on a host hard drive;
- environment variables required by the running application.
To pass the arguments, ports, volumes, and environment variables, when you run an image you can use docker CLI, docker SDK, docker compose yaml configuration file, or configs in the format of the orchestration layer you use. For example, in one of the previous parts, I showed how to run Swagger Editor, and Swagger UI (Viewer) from a terminal. This is how you can run mysql:
$ docker run --rm -p 3306:3306 --name some-mysql -v /my/own/datadir:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:tag
Pay attention to the :tag in the end. Each docker image you push the registry has a tag (version) or multiple ones. Later, we’ll review all arguments, and I’ll show how to run multiple docker images in a docker swarm with a docker compose and config in yaml format.
When you build a docker image for your application or make another one on top of an existing image (like nginx), to run it later, you can use 3 different strategies:
- Include everything it needs to run successfully to the image when you build it. For example, you want to run nginx, and it requires the static content files to host, private/public certificates, nginx configuration file. You build a new image on top of existing public nginx image, and include all the required files into it. Such type of images is self-sufficient and easy to run on Amazon Fargate for example.
- Pass the required content with parameters sharing it from your host filesystem to the running container when you run the image. We reviewed it already above.
- Combine 1 and 2 based on the needs of your deployment strategy.
In this part, we’ll run nginx as 2, in the next part as 1.
Running multiple docker containers together as a cluster
Running a single docker image can help you during development, to run swagger, database, etc. But what about simple deployment?
Check the deployment schema on the left. We deploy a Node.js App which works with MongoDB and Redis, and it is hidden behind nginx reverse-proxy. The outside world can talk to our system using TCP/IP ports 80 and 443, and all the requests to the ports get to nginx first. nginx as a reverse-proxy can do many things, for example, SSL termination, compression and decompression, requests dispatching to multiple services (micro-services) or web-services based on the url path, monitoring, logging, rate limiting, load balancing, and many others.
There are a few important statements which work for a cluster of any type, independently of where and with which tools you deploy. As long as you understand them, you can dig into any documentation and understand how to implement them:
- All applications (running docker images as containers) in the cluster work in an isolated network and can see each other; for most orchestration layers you specify rules to make them see each other (combine to sub-networks or link to each other);
- The outside world can talk to the cluster only via exposed IP port(s) which is(are) bound to a specific service (running container, nginx on the example above);
- Any container can be scaled vertically (give more CPU and memory) and horizontally (run multiple instances of the same container); the scaling implementation varies on different cluster orchestration layers and can be automated (more incoming requests leads to more running containers);
- The cluster of containers can be executed on a single physical machine, multiple machines, or without managing a physical infrastructure, like AWS Fargate.
Now, it is time to practice. By the end of this part, you will learn how to build and run a cluster on your personal computer like this:
In the tutorial, we developed a backend REST API service with Node.js. Let’s use it as a base to build a docker image and run it in a docker swarm cluster. Once you finish this part, you’ll have a complete picture, and you will be able to understand and use any other container orchestration layer.
If you don’t follow the tutorial, you can get the sources and use them as a start point:
$ git clone https://github.com/losikov/api-example.git
$ cd api-example
$ git checkout tags/v8.0.0
Download and install the docker, register an account (optional). Free account plan allows us to have unlimited number of public repositories, and one free. Sign in to the Docker Desktop after installation (optional).
Build a Docker Image of a Node.js App
Create a Dockerfile file in the root of the project with the following content:
If you need more details about a Dockerfile syntax, you can check it here.
Create .dockerignore in the root of the project with the list of files and folders which should be completely excluded from a docker image:
You can build an image now:
$ docker build -t api-example .
If you created an account, you can build as:
$ docker build -t <your account>/api-example . # losikov/api-example
Hopefully, you see “Successfully tagged api-example:latest” in the end.
Some useful docker commands to manage your local registry:
$ docker image ls
$ docker image rm <repository name or image id>
$ docker image prune --all --force # remove all unused image
Pushing an image to a Registry (optional, if you created an account)
The main purpose to push an image(s) to a registry is an ability to run them on any node where docker is installed. It is downloaded automatically when you try to run it for the first time.
To push an image, you need to create a repository first, public or private, and then:
$ docker push <your account>/api-example # losikov/api-example
You can push an image to Amazon ECR, or other registries, including personal, with the same command.
Run a Docker Image of a Node.js App
Above, there was an example of a run command for mysql. Let’s run it now and review the arguments:
$ mkdir db
$ docker run --rm -p 3306:3306 --name some-mysql -v db:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:latest# --rm - remove the container automatically when it exits/killed
# -p 3306:3306 - bind <host port>:<to container port>, you can specify ranges tcp(default)/udp, set a different host port then a default port on which service in a container is running
# --name some-mysql - to use in CLI instead of ID
# -v db:/var/lib/mysql - pass volume the the container, local db folder as /var/lib/mysql
# -e MYSQL_ROOT_PASSWORD=my-secret-pw - environment variable
# -d - run as detached, in background - try without it
# mysql:latest - container name in the registry and tag
For further commands use either container ID or name (orange). Pay attention that the container listens 2 ports, 3306 and 33060 (blue on the screenshot above), but only 3306 is exposed. 33060 is not available outside. Run netstat -na | grep LISTEN to check.
Try other commands:
$ docker ps # list running containers
$ docker stats # info about running containers (unix top)
$ docker logs -f some-mysql # show logs (-f == tail -f)
$ docker exec -it some-mysql /bin/bash # join container's bash
$ docker exec some-mysql /usr/bin/mysqldump --password=my-secret-pw user # execute a command in a container
$ docker cp test.file some-mysql:/tmp/test.file # copy a file from your host file system to a container's file system
To kill the running image:
$ docker kill some-mysql # or by container id
Now, it is time to try our Node.js image, we’ve built above. First, run redis and mongodb with the script, we implemented in the previous part:
$ ./scripts/run_dev_dbs.sh -r
The script creates required folders, runs the databases as docker containers and exposes the default ports on your host system, the same way as for mysql.
If we run api-example container now without any extra arguments, the Node.js example App will try to connect to redis and mongo urls specified in config/.env.prod, it means to redis://localhost:6379 and to mongodb://localhost/exmpl. But, the application is running in the container, in its own virtual host. To make Node.js app running in a container to connect to redis and mongo exposed to your host system, you need to specify IP address of your host system. Run the container and pass REDIS_URL and MONGO_URL as environment variables specifying IP address of your host system instead of a localhost:
$ docker run --rm -p 3000:3000 --name api-example -e REDIS_URL=redis://192.168.1.4:6379 -e MONGO_URL=mongodb://192.168.1.4/exmpl api-example
We run all the containers individually. Now, let’s try to run using Docker Compose and Docker Swarm.
Run Local Cluster with Docker Compose
Check the latest digram above with the cluster structure. The cluster has nginx working as a reverse proxy, giving a lot of benefits described in Theory section, and serving all incoming HTTP(s) requests proxying them to api-example and wp services. Let’s define a config for it. Create config/swarm/nginx-reverse folder and nginx.conf file in it with the following content:
In the comments, I left a working configuration for hosting on 443 port in case you deploy to a real server and have certificates for it (lines 1–21). I showed how to do compression (both ways) for json (lines 23–31), comment them if your client doesn’t support it. If you don’t need WordPress, specify deny all instead of proxy_pass http://wp (line 42). Hopefully, this config gives you an idea how to add other stuff, plugins or services, if you need them.
api-example.local is defined as a hostname (line 17). To make nginx work, you need a real hostname. Add it to your /etc/hosts:
$ echo "127.0.0.1 api-example.local" | sudo tee -a /etc/hosts
Finally, let’s define a cluster config. Create config/swarm/docker-compose.yml file with the following content:
In the config we defined 3 networks inside cluster (lines 5–7), nginx-reverse, api-db and wp-db, so that our 6 services could talk to each other using a service name, nginx-reverse, api-example, wp, mongo, redis and mysql, inside cluster, for example http://wp or mongodb://mongo/exmpl:
The config is self-explanatory (spend time to review it), except a few moments explained below.
Volumes. The config uses the filesystem on a host OS (lines 23, 24, 75 and 118), and pass the files and directories from host OS to the services, to nginx-reverse, mongo and mysql, as volumes to mount. Create the directories (adjust location if needed both, on filesystem and in the config). Go to the root of the project and run:
$ mkdir -p ../docker/mongodb # (relative to root of the project)
$ mkdir -p ../docker/mysql
Update line 35 to fix api-example full image name (explained later).
Now, you are all set, to run the services with docker-compose:
# for interactive mode, to see logs, to debug:
$ docker-compose -f config/swarm/docker-compose.yml up
# CONTROL-C to interrupt/kill it.# in background mode:
$ docker-compose -f config/swarm/docker-compose.yml up -d# to kill/remove running in background mode:
$ docker-compose -f config/swarm/docker-compose.yml down
All services should be up and running:
If you make a request to /api/v1/hello, you will get a response, and it will be logged by both, api-example service and nginx-reverse:
WordPress hostname Issue. WordPress has an issue with an initial hostname if you run it behind nginx-reverse proxy. I provide one of ways to fix it (expose 8080 first, make initial WordPress setup, change host name, close 8080). Skip the next step if you don’t want to play with WordPress.
To connect to WordPress, you need to use http://api-example.local:8080 at this point:
The initial configuration exposes 2 ports initially, 80 of nginx and 8080 for WordPress. 8080 is exposed temporarily, to fix WordPress hostname issue. Open in browser http://api-example.local:8080, create an account, login, and fix WordPress Address and Site Address, remove the port:
Once you do it, you can comment lines 90–91 in the config which expose the 8080 port. You don’t need to restart the services if they are running in background to apply a new configuration, just run again:
$ docker-compose -f config/swarm/docker-compose.yml up -d
and it will update the existing services and configuration. Now you don’t need to specify :8080 in the url, just open http://api-example.local in your browser:
Vertical Scaling. Uncomment lines 40–45 and 83–88 which set up multiple replicas of api-example and wp services. Run the “up” command again to apply a new configuration, and more containers will get up automatically. You can scale up & down, change available CPU & memory to the services on fly.
Try how it works, simulate a load to the cluster with a simple command:
$ while true ; do curl --silent -H "Host: api-example.local" http://127.0.0.1/api/v1/hello > /dev/null ; done
The load will be distributed evenly to the api-example services:
How to update already running services in the cluster?
After your do any code changes, you can rebuild the image as before. But use another command now:
$ docker-compose -f config/swarm/docker-compose.yml build
This command has another benefit. It rebuilds all the images which are specified in the config to build, in this case only api-example, lines 36–38. You can have multiple images inside the configuration file, and it will build and push all of them with one command.
Push the image(s) (optional):
$ docker-compose -f config/swarm/docker-compose.yml push
and finally, update as before:
$ docker-compose -f config/swarm/docker-compose.yml up -d
If you have multiple services scaled vertically, they will be updated one by one, with no downtime.
Docker Swarm Cluster (optional to do, must have to read)
Docker Swarm is a group of multiple machines which run the services, or just a cluster. The services are distributed across the machines in the swarm. You can setup docker on multiple machines, for example install Linux in VirtualBox, and then install docker on each of them. Then, on a master machine, run:
$ docker swarm init
Here’s the an output:
I had issues setting up my MacOS as a master node (due to existing docker/kubernetes configurations and multiple network interfaces). netstat didn’t show listenting port after I init the swarm on it. Google if if you have an issue. No issues on a clean Linux in VM.
On other machines, worker nodes, which you want to join the swarm, run “docker swarm joing …” from “docker swarm init” output (see the screenshot above). The result:
To run the services or update them in the swarm cluster, use this command on a master node:
$ docker stack deploy -c config/swarm/docker-compose.yml example
$ docker stack rm example
Replicas of the services are running on random nodes (virtual machines).
How about mounted volumes if a service is running on a random node in a cluster?
We can either mount a volume to all running nodes (like nfs), or make a service which mounts a volume to run on a specific node where that volume exist — let’s check how to make it.
Check lines 15–18, 68–71 and 105–108 of the config, deploy.placement.constraints = ‘node.role == manager’. We specify that the service should run only on a master node. As a result, nginx-reverse, mysql and mongo services will always run on a master node, and the main reson for that is to mount volumes which are on the master node File System:
There can be multiple master nodes though, I give an example for understanding how it should work. The placement technique is different for a specific tool and a cloud service (ECS, EKS) you use for deployment, but the principle is the same.
Forcing mysql and mongo to launch on a specific node makes sense, but it doesn’t make sense for nginx-reverse. You may scale nginx-reverse across multiple machines, and it is better to avoid dependency on mounted volumes and build own nginx image which includes required files. We’ll do it in the next part.
WordPress also requires a persistent storage to persist plugins, themes, uploaded media files.
To Sum Up
Let’s list the key things we went through:
- how to build an image and strategies for building it (passing everything as arguments and create self-sufficient image — example in the next part);
- how to push an image to a repository
- how to run a container and pass arguments to it (exposed ports, volumes, environment variables);
- how to manage running containers;
- how to manage local repository;
- how to run a cluster with connected services in it using docker-compose.yml;
- how to build and push multiple images using docker-compose.yml;
- how to mount volumes;
- how to pass environment variables;
- how to scale vertically and horizontally;
- how to create create a swarm and add nodes to it;
- how to assign services to run on specific nodes in a cluster.
If you have a clear understanding now of a key role of a docker in a deployment process, and the deployment principles described in this part, I’m pretty sure you’ll be able to deploy yourself to any type of Cloud. They are all identical and use the same concepts.