Docker allows users to build, distribute, and operate software in isolated containers. But why do we need it? We had been running software for a while already and didn’t need containers. In this article you’re going to learn the basics.

A container is a small, standalone, executable software package that comes with all the files and libraries and the code, runtime, environment variables, and configuration data needed to run a piece of software.

You can google an interesting diagram:

  • A single or multiple physical servers
  • Multiple virtual machines on a single physical server
  • Multiple containers on a single VM

So, instead of installing and provisioning VMs and installing our software there, we have a way to avoid this mess.

Isn’t that great? The simpler things are, the better, right?

Technically yes! In practice, it’s yet another tool to learn. There’s no doubt that it simplifies life drastically. However, many overpromise what Docker can offer and forget about the new cons:

  1. Huge reliance on public registries;
  2. Not every application is meant to be dockerized if you could containerize it.

Let’s unpack these.

Reliance on public registries

By default, most of the images you’re putting into the FROM line are coming from docker dot io unless you explicitly state otherwise. Docker, being a public registry, has progressed a lot, especially since COVID. However, you still practically believe someone else is delivering what you want (without ever telling them what you wish). You’ll usually be fine if you stick with official base images. Plus, recently, Docker Hub introduced a default security scanner, and it’s public information for you as a consumer. You can practically see what vulnerabilities your base images have. This will make the Internet a much more secure place.

Not everything was meant to be deployed with Docker

Every new tech is a docker native citizen. Old stuff, not so much. And I’m not talking about 2015 old, but earlier stuff like old Java, .NET, and old Linux stuff in general. You could benefit by containerizing what you could. However, it won’t automatically solve all your problems. If you containerize an old system, you’d still have to do substantial refactoring to account for the new world and 12-factor app methodology.

If you don’t refactor, you end up with a Docker application that relies on config files from a disk instead of environment variables, or you put multiple applications into a single container. Something doesn’t stop properly, and now you have a zombie container that needs to be manually stopped.

Docker is a tool. It’s supposed to be used in a specific way. Remember that.

Docker Way

Let’s imagine you have a web application. Many love NodeJS, so let it be a Node web application.

If you were a typical NodeJS developer, here’s how you’d start working on a NodeJS project:

  • Clone the code
  • Fill in variables in your local .env` file
  • Run npm i in your terminal
  • Run npm run start or something similar to start an application in the development mode.
  • Your application might have a database or some system libraries as dependencies, so you’d install those, encounter issues along the way, and eventually figure this out.

This barely scales beyond one developer.

If this project were to have Docker, this developer’s process would look quite a bit different:

  • Clone the code
  • Run docker compose up

This command would read docker-compose.yaml and use Dockerfile from your repository to build and run your application and its dependencies. And the thing is that the same Dockerifle is being used to build it for production and other developers.

And this scales well, doesn’t it?

Dockerfiles

A Dockerfile is a text file that contains instructions for building a Docker image. These instructions include which base image to use, which ports to expose, and which command to run when a container is started from the image.

Here’s an example of a Dockerfile:

# Use an official Node.js runtime as the base image

FROM node:20-alpine

# Set an environment variable

ARG NODE_ENV=production

ENV NODE_ENV $NODE_ENV

# Create and set the working directory

RUN mkdir -p /usr/src/app

WORKDIR /usr/src/app

# Copy the package.json and package-lock.json files

COPY package*.json ./

# Install the dependencies

RUN npm ci --only=production

# Copy the application source code

COPY . .

# Expose the port the application will run on

EXPOSE 3000

# Start the application

ENTRYPOINT [ "npm", "start" ]

CMD [ "--host=0.0.0.0" ]

Docker images are built from layers created by instructions in a Dockerfile. Each Dockerfile command results in a new layer being added to the image. The final image is composed of many layers placed on top of one another. Each layer is immutable, meaning it cannot be modified after creation.

Let’s go through each statement:

  • FROM statement tells what image you should use as a base for your own. You can reference public images, your local images, and any other private images you can access.
  • ARG statement defines a variable you should pass during the build process. It will allow you to parameterize specific components.
  • WORKDIR is a default folder in the container’s file system.
  • RUN executes the shell command. Typically, you install software or download some packages.
  • COPY & ADD are instructions used to copy files and directories from the host machine into the image, but they have some key differences.

The COPY instruction is used to copy files and directories from the host machine to the image. It has the following basic syntax: COPY <src> <dest>. The <src> argument is the file or directory on the host machine that you want to copy, and the <dest> argument is the location in the image where you want to copy the files. COPY instruction only supports local files.

The ADD instruction is similar to COPY but has some additional functionality. In addition to local files and directories, ADD can also copy files from URLs. The ADD instruction has the following basic syntax: ADD <src> <dest>. The <src> and <dest> arguments have the same meaning as in COPY.

The most significant difference between COPY and ADD is that ADD instruction can unpack .tar and .tar.gz files and copy files and directories from a URL. This can be useful for downloading and extracting an archive file as part of the build process.

If the files you want to copy are available on a URL, and you don’t need to unpack an archive file, then COPY is more efficient since it doesn’t need to decompress files.

Side Note: In practice, I’ve rarely seen usage of ADD to unpack .tar.gz files. You’d do this only when you install some dependencies unavailable via the package manager, and most of the projects don’t require so many actions. I’d move such a thing into a base image and do as few such actions as possible.

  • ENTRYPOINT and CMD are the instructions used to specify the command that will be executed when a container is started from the image, but they serve different purposes.

The ENTRYPOINT instruction is used to configure a container as an executable. It sets the command that will be executed when the container starts and cannot be overridden when the container is started. ENTRYPOINT instruction allows you to configure a container to run as an executable rather than just running a one-off command when it starts up.

The CMD instruction, on the other hand, is used to provide default arguments for the ENTRYPOINT instruction. It specifies the command and its arguments that will be executed if no command is provided when the container is started. The CMD instruction can be overridden when the container is started by giving a command as an argument to the docker run command.

An example of how these two instructions might be used in a Dockerfile:

FROM alpine 

ENTRYPOINT ["echo"] 

CMD ["Hello World"]

This Dockerfile sets the ENTRYPOINT to echo command and the CMD to Hello World. The command echo "Hello World" will be executed when the container starts. If you run docker run <image-name> "How are you?" it will run echo "How are you?" instead of "Hello World" since you have overridden the default CMD.

Performance and Dockerfiles

It takes time to build things. That’s why your aim should be to do as few things as possible in a Dockerfile. Another goal is to minimize the number of layers by grouping commands when possible, copying files, and installing only the software required for your application to run.

The reason why it’s essential is that the structure of a dockerfile will influence the time it takes to build the container, plus what goes in there. If the build fails, you must start again, and the faster you build, the quicker you ship.

Docker images are built using the docker build command, which reads the instructions from a Dockerfile and creates the layers that make up the image. Once the image has been created, it can launch one or more containers. Each container runs in its isolated environment and has its own file system, network interface, and process space.

The docker run command is used to launch containers from images. When a container is started, the command specified in the Dockerfile is executed, and the container begins to run. A container can be stopped and restarted, but its internal state does not persist across restarts. To create a container with persistent data, you can use volumes, which allow data to be stored outside the container’s file system.

In summary, Docker enables the packaging of software in a format that can run consistently in any environment, and this is achieved by using the following:

  • Dockerfile - a blueprint for creating an image
  • Docker layers: each instruction in a Dockerfile creates a new layer in the image
  • COPY instruction copies local files and directories from the host machine to the container file system
  • ADD instruction can do everything that COPY can do, and it also can unpack .tar and .tar.gz files and copy files from the URL.
  • RUN executes the shell command
  • ENTRYPOINT sets the command to be run when the container starts, and it’s the executable.
  • CMD provides default arguments for the command specified in ENTRYPOINT, which can be overridden at runtime.
  • Docker containers are created by running an image, and they are lightweight and isolated environments that run processes.

Dockerfile optimizations

Optimizing a Dockerfile can help to reduce the image size, improve the build time, and make the image more secure.

One way to optimize a Dockerfile is to minimize the number of layers in the image. Almost all instructions in a Dockerfile create layers in the image. You can reduce the number of layers and decrease the image size by reducing the number of instructions

Another way to minimize the number of layers of a Dockerfile is to use multi-stage builds. A multi-stage build allows you to use multiple FROM instructions in a single Dockerfile, each with its instructions. You can use one stage to build your application and another stage to package the application in a small runtime image. This can help to reduce the final image size because you only need to include the runtime dependencies in the final image, not the build-time dependencies.

Another good practice is using alpine-based images as base images. Alpine Linux is a lightweight distribution of Linux often used as the base image for Docker images. These images are smaller in size compared to other distributions and are optimized for containers.

Additionally, you can optimize build time by caching. Docker uses a cache for intermediate image layers. If a command in a Dockerfile has not changed, the cache for that command is reused, saving time during the build process. By ordering the instructions in the Dockerfile so that the instructions likely to change are at the bottom of the file, you can improve cache utilization and reduce build time.

A good practice is to use the .dockerignore file, which allows you to specify files and directories that should not be included in the build context. This can help reduce the build context’s size and improve build times, especially when building images from large code bases.

Minimizing the number of layers

FROM alpine

# Use one instruction to copy multiple files
COPY file1.txt file2.txt file3.txt /app/

# Instead of:
COPY file1.txt /app/
COPY file2.txt /app/
COPY file3.txt /app/

Using multi-stage builds to reduce final image size

# Build stage
FROM myproject-build:latest as build
RUN apk add gcc
WORKDIR /app
COPY . .
RUN go build -o myapp

# Final stage
FROM alpine:latest
COPY --from=build /app/myapp /usr/local/bin/
CMD ["myapp"]


# docker build --no-cache 

This example uses a multi-stage build to build the Go application in the first stage and then copy the binary file to a smaller Alpine image in the final stage. This way, only the runtime dependencies are included in the final image, not the build-time dependencies.

Using alpine-based images as base images

# Use alpine based image
FROM alpine:latest

# Instead of
FROM ubuntu:latest

Caching intermediate image layers

FROM node:alpine

# Move the instruction that changes less frequently to the top
COPY package*.json /app/
RUN npm ci
COPY . /app

# Instead of
COPY . /app
COPY package*.json /app/
RUN npm ci

By ordering the instructions this way, the cache will be reused for the COPY package*.json /app/ instruction, and the RUN npm ci command will only be executed when the package.json files change.

Using a .dockerignore file to exclude unnecessary files and directories from the build context

# in .dockerignore
node_modules/

# this will ignore node_modules directory when building the image.

By following these best practices, you can create efficient and small Docker images that have fast build times and are secure.

Docker-compose

Docker Compose is a tool for defining and running multi-container Docker applications. It allows you to define a set of containers, their configurations, and how they interact with each other in a single file called a docker-compose.yml. With Compose, you can use a single command to create and start all the services defined in the file. It’s an excellent tool for local development and quick prototyping. However, it needs the orchestration capabilities required when you run Docker in production.

A docker-compose.yml file consists of multiple sections, including:

  • services: This is the main section where you define the services that make up your application. You can specify the image to use, the ports to expose, the environment variables, and the volumes to mount.
  • networks: This section allows you to define networks for your services to connect to.
  • volumes: This section allows you to create and manage volumes for your services.

Here’s an example of a docker-compose.yml file for a web app with a frontend, backend, and a database.

version: '3'
services:
  frontend:
    build: frontend/
    ports:
      - "80:80"
    environment:
      - NODE_ENV=production
    networks:
      - app_network
    depends_on:
      - backend
  backend:
    build: backend/
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=mongodb://database:27017/mydb
    networks:
      - app_network
    depends_on:
      - database
  database:
    image: mongo:latest
    ports:
      - "27017:27017"
    networks:
      - app_network
    volumes:
      - mongo_data:/data/db
networks:
  app_network:
volumes:
  mongo_data:

This example defines three services: frontend, backend, and database.

The frontend and backend services are built from their respective folders

The frontend service runs on port 80, and the backend service runs on port 3000. also, frontend service depends on backend service.

The database service uses the Mongo image and runs on port 27017.

All the services are connected to the app_network network. The database service also uses a volume named mongo_data to store the data.

To run the application, you would use the docker compose up command. This command would start all services defined in the compose file and automatically create the network and volumes.

In this example, all the services could be connected and interact seamlessly. You can access the frontend service via http://localhost, the backend service via http://localhost:3000, and the MongoDB service via mongodb://localhost:27017.

It’s worth noting that this example is very basic and can be extended to a more complex application with more services and configurations. Still, this example gives you an idea of how Docker Compose can manage multi-container applications.

Conclusion

Go build dockerfiles. The theory is all right, but it will only do you good with practice.