Last time, we talked about Docker in general with a few examples out of the box. Now, let’s get real. It’s time for practice.

Nodejs is a poison of choice for many developers. It’s a powerful technology that lets you write Javascript code for server-side applications. Typically, Javascript runs in a browser on the client side.

Developers run specific versions of Nodejs on their local machine and often use a tool nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions (github.com) to run multiple versions of nodejs.

They also use NPM (Node Package Manager) to manage the software dependencies.

As for our practice exercise, we’re going to use Google Microservices Demo, you can read more about it on the Test Repositories page

Dockerizing Payment Service

Payment service is responsible for payments! Who knew?!

It’s under src/paymentservice. It’s independent enough, allowing us to address Dockerization fairly quickly.

It’s not hard to dockerize a nodejs app. Let’s navigate to src/paymentservice of the Microservices demo. You can look at the existing Dockerfile to try to find out how it’s done, but let’s remove it and do this ourselves.

Find the NodeJS Version!

At the time of writing, the latest LTS is 20.x. You can go to Dockerhub and find out the specific LTS version you should use. I suggest using the lts-alpine tag. We rely on the creators of these images to handle minor system and node upgrades, so we don’t have to do this ourselves (who wants to do maintenance when it can be avoided, right?)

We get our base image from Dockerhub. Some organizations don’t allow it and they have their own base images instead from their private registries (which is OK). Use either your organization’s base images or official images since organizations that control them are verified by Dockerhub

Why does a company have a private registry?

There could be quite a few reasons, but those I saw thus far are:

  • Availability: they don’t want to rely on a public registry, they don’t operate for their infrastructure
  • Security: they can ensure integrity control vulnerabilities detection process, as well as the whole Docker image pipeline (for example, a new image couldn’t be released until all HIGH-level vulnerabilities are resolved)

Continue with a Dockerfile

Our application is running on a specific port. Try to search port in this folder. You won’t see a value, so we have to set it.

HipsterShopServer.PORT = process.env.PORT;

We can set it to whatever we want within the Linux limits if it doesn’t have a default value. Let’s do 5555.

We need to expose our service to the internal network so other services can send requests to it, plus we can call it out and see what’s going on.

We also need a folder we’ll be working from, and let’s name it the app.

With programming languages you’d want to download dependencies first, then perform the build if applicable. NodeJS doesn’t require the build step, so we should be fine with just downloading dependencies and running the service. It’s typically done via npm ci.

In the end, you should have a Dockerfile like this.

FROM node:lts-alpine

ENV PORT 5555

EXPOSE ${PORT}

WORKDIR /app

COPY package* .

RUN npm ci --only=production

COPY . .

ENTRYPOINT ["node", "index.js"]

Let’s recap and see what each instruction means

  • We got the LTS node version since we need to have one >= 16
  • We added FROM node:lts-alpine to ensure we use the latest node possible because it’s Alpine, we minimize the size of the image
  • ENV PORT 5555 setting environment variable PORT to have a value of 5555 and it’s being used by the application to receive incoming requests.
  • EXPOSE ${PORT} takes PORT value from an environment variable set earlier and exposes this port to the internal Docker network, so other containers in the same network can send requests
  • WORKDIR /app says that we can use /app inside the docker image as our base folder.
  • COPY package* . will copy package.json and package-lock.json. You don’t change these files often, so it’s worth adding them to a separate layer to speed up the build. If you don’t do this, and just add COPY . . instead, you won’t be using Docker cache in the subsequent runs. It’s not a big deal for small projects with few dependencies, but it takes more time to download when the number of dependencies grows.
  • RUN npm ci –only=production. CI command is specifically designed to get what’s in package-lock.json and install it, and we don’t want to install anything non-production related (like test framework or some development dependencies). You can also find something like RUN npm i or RUN npm install, and these will try to update package-lock.json if there are new versions of dependencies. and you don’t want to do that, because you want to use exact versions in GIT to ensure stability because that’s what’s running in the developer’s machine.
  • COPY . . will copy everything from context to docker image
  • ENTRYPOINT [“node”, “index.js‘] every NodeJS project has an entry file you need to run, in order to run the whole app. In some cases, it’s node server.js, in our it’s node index.js. You can also see package.json under scripts there are typically commands to run certain things, and you can find serve or start-production etc. If you search for such a command in your package.json you’d run it like npm run serve.

This is a pretty good general attempt at how to dockerfile a nodejs app. Now navigate in your terminal cd src/paymentservice and run a docker build -t paymentservice . to build your image.

Nothing Works From the First Try: Debugging

If you thought this would be easy, you’re wrong :) Now it’s time for issues.

This build attempt produced an error ERROR: failed to solve: process "/bin/sh -c npm ci --only=production" did not complete successfully: exit code: 1

Don’t panic. While examining logs, you can see gyp ERR! find Python Python is not set from the command line or npm configuration.

From the description, we see that it says Python is not found. It is safe to assume we need Python to install some dependencies. It’s not uncommon since many dependencies are being built as a part of installation, so they need the dependencies themselves. Typically, these gotchas can be found in a README.md. Or you can ask devs about it. They might not realize they have such dependencies simply because they already have these installed on their machine.

To fix it, let’s add RUN apk add –no-cache python3 before the first COPY package* .

FROM node:lts-alpine

ENV PORT 5555

EXPOSE ${PORT}

WORKDIR /app

RUN apk add –no-cache python3

COPY package* .

RUN npm ci --only=production

COPY . .

ENTRYPOINT ["node", "index.js"]

To reiterate: we do RUN to install software dependencies before copying package* stuff because we want to cache that Python installation. This would significantly help us to reduce build time in the future builds because once we figure this out, there’s not much need to install python again, and our build system will cache the docker layers, thus significantly speeding up the build process. And the larger the organization is, the more impact the fast build time has.

And let’s run docker build -t paymentservice . again.

So, now you see another error. Examine the logs, and you can find a message stack Error: not found: make. This means we need to install something else, now it’s make. Let’s add make after python3 so it looks like RUN apk add –no-cache python3 make and run the build again.

Make is a build automation tool. It uses Makefile to get instructions and execute them in the specified order. This tool is useful for large projects where the complex build process requires many actions, dependencies, etc. Make helps to organize and manage that. However, it’s not required to run nodejs :)

Let’s run docker build -t paymentservice . again.

So, now you see another error. Let’s examine the logs again. And you should see a message make: g++: No such file or directory. We need to add G++ (which is a compiler). From the logs you can see that we’re doing this to build node-gyp the package. So this means we need these tools only to build it, not to run it. Let’s add g++ to our command so it looks like RUN apk add –no-cache python3 make g++

node-gyp helps to build add-ons for nodejs. It’s a command-line tool often used implicitly by developers. You don’t typically install python, g++ and make, because they come in Linux and Mac by default in many distributions.

And yet again, docker build -t paymentservice .

Cool!!! Now, the docker build works. Let’s look at the image size by running docker inspect paymentservice. You can search for Size. It’s a number of bytes. The image is roughly 620 MB.

And it’s a lot! Let’s see what we can improve. Since we need python3, make and g++ only for a build, let’s move them to another stage.

Let’s move instructions apk add and npm ci to another stage. And then copy node_modules it into the resulting image. If you build it, you’ll see it’s around 281mb, which is more than 2 times smaller.

Here’s the resulting Dockerfile:

FROM node:lts-alpine as base

WORKDIR /app

RUN apk add --no-cache python3 make g++

COPY package* .

RUN npm ci --only=production

FROM node:lts-alpine as result

ENV PORT 5555

EXPOSE ${PORT}

WORKDIR /app

COPY --from=base /app/node_modules ./node_modules

COPY . .

ENTRYPOINT ["node", "index.js"]

Small images

When you get a small image, it’s faster to pull and deploy.

It’s faster to scan smaller images.

Also, it’s cheaper to store at scale (you’ll see the difference on multiple TBs scale).

What’s next

Now we have the image, let’s run it. The truth is - if you built something successfully, there’s 0 guarantee it will work.

docker run --rm -p 5555:5555 paymentservice

Tip --rm flag will remove the container when it’s stopped (good to do locally when you don’t want to produce a TON of stopped containers during development).

-p will map the listener of a host port to a container port we specified in a dockerfile in the form of ENV variable (this is a port application will listen to). By default, Docker is kind of in its own isolated world, and to ensure we can call to a service running in a container from our local we need to forward a port.

It starts correctly but starts throwing errors in a couple of seconds. This is OK. Let’s dig into them:

/app/node_modules/@google-cloud/profiler/build/src/index.js:120
        throw new Error('Project ID must be specified in the configuration');
              ^

Error: Project ID must be specified in the configuration
    at initConfigMetadata (/app/node_modules/@google-cloud/profiler/build/src/index.js:120:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async createProfiler (/app/node_modules/@google-cloud/profiler/build/src/index.js:158:26)
    at async Object.start (/app/node_modules/@google-cloud/profiler/build/src/index.js:182:22)

It asks for Project ID. But we don’t have one, right? Let’s see if we can disable that. In some cases, you cannot do that. It could be a runtime dependency, and the application couldn’t start without it (a good example is a database connection).

Typically, you’d ask a developer about it :) Let’s explore things ourselves.

We start with index.js and right away, you’ll see:

if(process.env.DISABLE_PROFILER) {
  console.log("Profiler disabled.")
}
else {
  console.log("Profiler enabled.")
  require('@google-cloud/profiler').start({
    serviceContext: {
      service: 'paymentservice',
      version: '1.0.0'
    }
  });
}

Right away: if DISABLE_PROFILER, then do nothing. It’s a good sign. Meaning we can disable it (it’s not always the case, so it’s worth investigating and confirming with a team).

Why do we want to disable it? It’s a fair question. I’d answer it the next way: We are trying to run the service without reliance on a specific environment (in this case, it’s GCP), and we want to disable anything by default that stops us from running in a generic environment. Depending on the targeted environment, we’ll be able to overwrite the variables when we start the container.

Let’s add ENV DISABLE_PROFILER 1. Why 1? 1 means TRUE. Thus, we’re saying we want to disable the profiler.

This is how our dockerfile should look like:

FROM node:lts-alpine as base

WORKDIR /app

RUN apk add --no-cache python3 make g++

COPY package* .

RUN npm ci --only=production

FROM node:lts-alpine as result

ENV PORT 5555
ENV DISABLE_PROFILER 1

EXPOSE ${PORT}

WORKDIR /app

COPY --from=base /app/node_modules ./node_modules

COPY . .

ENTRYPOINT ["node", "index.js"]

Let’s run it! Now a service starts correctly and we can see no issues. Meaning a service is ready to serve requests.

What can we improve here?

There are a few things:

  • FROM statement, we can use a specific version and its corresponding hash. This will strengthen the security by referring to an image, not by a tag, but by its hash. Tags could be re-assigned, but hashes could not.
  • apk add --no-cache python3 make g++ rely on the latest available versions of these packages. In the ideal world, you’d rely on specific versions of these packages. You’d want to add .dockerignore file to ensure local node_modules aren’t copied to the resulting docker image through COPY . .

Why didn’t we make these improvements?

While developing, you have to take shortcuts. Otherwise, you’d be stuck in the giant queue of upgrades, tickets, etc. When the product is more mature, and there’s no active development going on, then it’s a really good idea to harden things and implement these practices. While during the active development or in the very early stages, there’s a high chance they will slow you down significantly.

I’ll see you on the next one, where we’ll go through Python.