Docker Basics: NodeJS
Last time, we talked about Docker in general with a few examples out of the box. Now, let’s get real. It’s time for practice.
Nodejs is a poison of choice for many developers. It’s a powerful technology that lets you write Javascript code for server-side applications. Typically, Javascript runs in a browser on the client side.
Developers run specific versions of Nodejs on their local machine and often use a tool nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions (github.com) to run multiple versions of nodejs.
They also use NPM (Node Package Manager) to manage the software dependencies.
As for our practice exercise, we’re going to use Google Microservices Demo, you can read more about it on the Test Repositories page
Dockerizing Payment Service
Payment service is responsible for payments! Who knew?!
It’s under src/paymentservice
. It’s independent enough, allowing us to address Dockerization fairly quickly.
It’s not hard to dockerize a nodejs app. Let’s navigate to src/paymentservice
of the Microservices demo. You can look at the existing Dockerfile
to try to find out how it’s done, but let’s remove it and do this ourselves.
Find the NodeJS Version!
At the time of writing, the latest LTS is 20.x. You can go to Dockerhub and find out the specific LTS version you should use. I suggest using the lts-alpine tag. We rely on the creators of these images to handle minor system and node upgrades, so we don’t have to do this ourselves (who wants to do maintenance when it can be avoided, right?)
We get our base image from Dockerhub. Some organizations don’t allow it and they have their own base images instead from their private registries (which is OK). Use either your organization’s base images or official images since organizations that control them are verified by Dockerhub
Why does a company have a private registry?
There could be quite a few reasons, but those I saw thus far are:
- Availability: they don’t want to rely on a public registry, they don’t operate for their infrastructure
- Security: they can ensure integrity control vulnerabilities detection process, as well as the whole Docker image pipeline (for example, a new image couldn’t be released until all HIGH-level vulnerabilities are resolved)
Continue with a Dockerfile
Our application is running on a specific port. Try to search port in this folder. You won’t see a value, so we have to set it.
HipsterShopServer.PORT = process.env.PORT;
We can set it to whatever we want within the Linux limits if it doesn’t have a default value. Let’s do 5555
.
We need to expose our service to the internal network so other services can send requests to it, plus we can call it out and see what’s going on.
We also need a folder we’ll be working from, and let’s name it the app.
With programming languages you’d want to download dependencies first, then perform the build if applicable. NodeJS doesn’t require the build step, so we should be fine with just downloading dependencies and running the service. It’s typically done via npm ci
.
In the end, you should have a Dockerfile
like this.
FROM node:lts-alpine
ENV PORT 5555
EXPOSE ${PORT}
WORKDIR /app
COPY package* .
RUN npm ci --only=production
COPY . .
ENTRYPOINT ["node", "index.js"]
Let’s recap and see what each instruction means
- We got the LTS node version since we need to have one >= 16
- We added
FROM node:lts-alpine
to ensure we use the latest node possible because it’s Alpine, we minimize the size of the image ENV PORT 5555
setting environment variable PORT to have a value of 5555 and it’s being used by the application to receive incoming requests.EXPOSE ${PORT}
takesPORT
value from an environment variable set earlier and exposes this port to the internal Docker network, so other containers in the same network can send requestsWORKDIR /app
says that we can use/app
inside the docker image as our base folder.COPY package* .
will copypackage.json
andpackage-lock.json
. You don’t change these files often, so it’s worth adding them to a separate layer to speed up the build. If you don’t do this, and just addCOPY . .
instead, you won’t be using Docker cache in the subsequent runs. It’s not a big deal for small projects with few dependencies, but it takes more time to download when the number of dependencies grows.RUN npm ci –only=production
. CI command is specifically designed to get what’s inpackage-lock.json
and install it, and we don’t want to install anything non-production related (like test framework or some development dependencies). You can also find something likeRUN npm i
orRUN npm install
, and these will try to updatepackage-lock.json
if there are new versions of dependencies. and you don’t want to do that, because you want to use exact versions in GIT to ensure stability because that’s what’s running in the developer’s machine.COPY . .
will copy everything from context to docker imageENTRYPOINT [“node”, “index.js‘]
every NodeJS project has an entry file you need to run, in order to run the whole app. In some cases, it’snode server.js
, in our it’snode index.js
. You can also seepackage.json
under scripts there are typically commands to run certain things, and you can findserve
orstart-production
etc. If you search for such a command in yourpackage.json
you’d run it likenpm run serve
.
This is a pretty good general attempt at how to dockerfile a nodejs app. Now navigate in your terminal cd src/paymentservice
and run a docker build -t paymentservice .
to build your image.
Nothing Works From the First Try: Debugging
If you thought this would be easy, you’re wrong :) Now it’s time for issues.
This build attempt produced an error ERROR: failed to solve: process "/bin/sh -c npm ci --only=production" did not complete successfully: exit code: 1
Don’t panic. While examining logs, you can see gyp ERR! find Python Python is not set from the command line or npm configuration
.
From the description, we see that it says Python is not found. It is safe to assume we need Python to install some dependencies. It’s not uncommon since many dependencies are being built as a part of installation, so they need the dependencies themselves. Typically, these gotchas can be found in a README.md
. Or you can ask devs about it. They might not realize they have such dependencies simply because they already have these installed on their machine.
To fix it, let’s add RUN apk add –no-cache python3
before the first COPY package* .
FROM node:lts-alpine
ENV PORT 5555
EXPOSE ${PORT}
WORKDIR /app
RUN apk add –no-cache python3
COPY package* .
RUN npm ci --only=production
COPY . .
ENTRYPOINT ["node", "index.js"]
To reiterate: we do RUN
to install software dependencies before copying package*
stuff because we want to cache that Python installation. This would significantly help us to reduce build time in the future builds because once we figure this out, there’s not much need to install python again, and our build system will cache the docker layers, thus significantly speeding up the build process. And the larger the organization is, the more impact the fast build time has.
And let’s run docker build -t paymentservice .
again.
So, now you see another error. Examine the logs, and you can find a message stack Error: not found: make
. This means we need to install something else, now it’s make
. Let’s add make
after python3
so it looks like RUN apk add –no-cache python3 make
and run the build again.
Make
is a build automation tool. It uses Makefile
to get instructions and execute them in the specified order. This tool is useful for large projects where the complex build process requires many actions, dependencies, etc. Make helps to organize and manage that. However, it’s not required to run nodejs :)
Let’s run docker build -t paymentservice .
again.
So, now you see another error. Let’s examine the logs again. And you should see a message make: g++: No such file or directory
. We need to add G++ (which is a compiler). From the logs you can see that we’re doing this to build node-gyp
the package. So this means we need these tools only to build it, not to run it. Let’s add g++ to our command so it looks like RUN apk add –no-cache python3 make g++
node-gyp
helps to build add-ons for nodejs. It’s a command-line tool often used implicitly by developers. You don’t typically install python, g++ and make, because they come in Linux and Mac by default in many distributions.
And yet again, docker build -t paymentservice .
Cool!!! Now, the docker build works. Let’s look at the image size by running docker inspect paymentservice
. You can search for Size
. It’s a number of bytes. The image is roughly 620 MB.
And it’s a lot! Let’s see what we can improve. Since we need python3, make and g++ only for a build, let’s move them to another stage.
Let’s move instructions apk add and npm ci to another stage. And then copy node_modules
it into the resulting image. If you build it, you’ll see it’s around 281mb
, which is more than 2 times smaller.
Here’s the resulting Dockerfile:
FROM node:lts-alpine as base
WORKDIR /app
RUN apk add --no-cache python3 make g++
COPY package* .
RUN npm ci --only=production
FROM node:lts-alpine as result
ENV PORT 5555
EXPOSE ${PORT}
WORKDIR /app
COPY --from=base /app/node_modules ./node_modules
COPY . .
ENTRYPOINT ["node", "index.js"]
Small images
When you get a small image, it’s faster to pull and deploy.
It’s faster to scan smaller images.
Also, it’s cheaper to store at scale (you’ll see the difference on multiple TBs scale).
What’s next
Now we have the image, let’s run it. The truth is - if you built something successfully, there’s 0 guarantee it will work.
docker run --rm -p 5555:5555 paymentservice
Tip --rm
flag will remove the container when it’s stopped (good to do locally when you don’t want to produce a TON of stopped containers during development).
-p
will map the listener of a host port to a container port we specified in a dockerfile
in the form of ENV
variable (this is a port application will listen to). By default, Docker is kind of in its own isolated world, and to ensure we can call to a service running in a container from our local we need to forward a port.
It starts correctly but starts throwing errors in a couple of seconds. This is OK. Let’s dig into them:
/app/node_modules/@google-cloud/profiler/build/src/index.js:120
throw new Error('Project ID must be specified in the configuration');
^
Error: Project ID must be specified in the configuration
at initConfigMetadata (/app/node_modules/@google-cloud/profiler/build/src/index.js:120:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async createProfiler (/app/node_modules/@google-cloud/profiler/build/src/index.js:158:26)
at async Object.start (/app/node_modules/@google-cloud/profiler/build/src/index.js:182:22)
It asks for Project ID. But we don’t have one, right? Let’s see if we can disable that. In some cases, you cannot do that. It could be a runtime dependency, and the application couldn’t start without it (a good example is a database connection).
Typically, you’d ask a developer about it :) Let’s explore things ourselves.
We start with index.js
and right away, you’ll see:
if(process.env.DISABLE_PROFILER) {
console.log("Profiler disabled.")
}
else {
console.log("Profiler enabled.")
require('@google-cloud/profiler').start({
serviceContext: {
service: 'paymentservice',
version: '1.0.0'
}
});
}
Right away: if DISABLE_PROFILER
, then do nothing. It’s a good sign. Meaning we can disable it (it’s not always the case, so it’s worth investigating and confirming with a team).
Why do we want to disable it? It’s a fair question. I’d answer it the next way: We are trying to run the service without reliance on a specific environment (in this case, it’s GCP), and we want to disable anything by default that stops us from running in a generic environment. Depending on the targeted environment, we’ll be able to overwrite the variables when we start the container.
Let’s add ENV DISABLE_PROFILER 1
. Why 1
? 1
means TRUE
. Thus, we’re saying we want to disable the profiler.
This is how our dockerfile should look like:
FROM node:lts-alpine as base
WORKDIR /app
RUN apk add --no-cache python3 make g++
COPY package* .
RUN npm ci --only=production
FROM node:lts-alpine as result
ENV PORT 5555
ENV DISABLE_PROFILER 1
EXPOSE ${PORT}
WORKDIR /app
COPY --from=base /app/node_modules ./node_modules
COPY . .
ENTRYPOINT ["node", "index.js"]
Let’s run it! Now a service starts correctly and we can see no issues. Meaning a service is ready to serve requests.
What can we improve here?
There are a few things:
FROM
statement, we can use a specific version and its corresponding hash. This will strengthen the security by referring to an image, not by a tag, but by its hash. Tags could be re-assigned, but hashes could not.apk add --no-cache python3 make g++
rely on the latest available versions of these packages. In the ideal world, you’d rely on specific versions of these packages. You’d want to add.dockerignore
file to ensure localnode_modules
aren’t copied to the resulting docker image throughCOPY . .
Why didn’t we make these improvements?
While developing, you have to take shortcuts. Otherwise, you’d be stuck in the giant queue of upgrades, tickets, etc. When the product is more mature, and there’s no active development going on, then it’s a really good idea to harden things and implement these practices. While during the active development or in the very early stages, there’s a high chance they will slow you down significantly.
I’ll see you on the next one, where we’ll go through Python.