We’ve been dockerizing many applications recently and even ran them locally in a Docker Compose

Now, it’s time to take a step further and build a production-ready version of this setup.

The first step is not to build production artifacts locally.

Quite a few years ago, we were introduced to the concept of continuous integration, which is when developers frequently merge and automatically test their changes. Back then, it was a shared machine! Now, this concept has evolved, and we have a lot of tools, such as Jenkins, Github Actions, Drone, etc., which represent that shared machine. The reasons why we need that “shared machine” are pretty simple: you aren’t working alone, and your local machine isn’t “production.” “shared machine” is unbiased and often catches issues that would appear in production but couldn’t be easily reproduced in your local environment due to your local settings.

A tool of choice for today is GitHub Actions. It’s a simple yet powerful implementation of a CI tool.

GitHub Actions has several moving parts:

  • Workflows (aka pipelines)
  • Cache
  • Releases
  • Artifacts

We’ll be using them all today.

Ideal Docker Build Workflow

You want your workflows to be as simple and as predictable as possible. The reason why is very simple: fewer things to support and change down the road.

This also leads to another tool in our pocket: templates. We’ll rely on them often to avoid copy-pasting when it doesn’t make sense. Certainly, CI becomes more complicated when you copy-paste a lot.

We also need to differentiate between pull requests and branch-type workflows. During the Pull Request Review process, we want to get much more information than when we do the actual build for deployment.

Here’s the Pull Request PR Diagram (for each service):

PR Diagram

As you see, there are 2 primary things we want to test:

  • run tests on a service if they exist
  • build a docker image and do security checks

Not every project has tests, so if it does, we want to ensure they pass. Otherwise, we fail the build. Passed tests don’t guarantee that your service will work correctly, but if they fail, indeed, something went wrong!

We want to ensure our deployable artifacts (docker images) can be built and don’t have critical or high-security vulnerabilities. Because if there’s a critical vulnerability and there’s a fix, you have to do it.

We also don’t want to overload our PRs with checks. Because you can have multiple types of tests, and you have to decide what’s worth it. Pipelines take time to run. The faster it is, the better.

So, the worst-case scenario in our project is 21 pipelines if you change everything in a single PR! Our goal is to speed up the common scenarios as much as possible.

We’re using Google Microservices as an example, so I forked it into my repo. I also removed everything related to CI/CD, Infrastructure, etc., from it.

git clone YOUR_FORKED_VERSION microservices-demo
cd microservice-demo
rm -rf .deploystack .github helm-chart istio-manifests kubrenetes-manifests kustomize release terraform cloudbuild.yaml skaffold.yaml src/adservice/Dockerfile src/cartservice/src/Dockerfile src/checkoutservice/Dockerfile src/currencyservice/Dockerfile src/emailservice/Dockerfile src/frontend/Dockerfile src/paymentservice/Dockerfile src/productcatalogservice/Dockerfile src/recommendationservice/Dockerfile src/shippingservice/Dockerfile
git add .
git commit -m "clean project"
git push -u origin main

If you were to do the same, please don’t submit your PRs to the central Google repository.

CI Architecture

We start with the Docker build. Since every project has a Dockerfile, we can create a typicall workflow.

Every workflow in GitHub Actions has its own file. Read about syntax and different options in the official docs - GitHub Actions Docs

All workflows are in .github/workflows folder.

Let’s start by creating _docker-pr.yml that will represent a typical docker build workflow for a PR.

We begin by setting up when this workflow can be executed.

on:
  workflow_call:
    inputs:
      project:
        required: true
        type: string

This means that we expect this workflow to be triggered from another workflow. GitHub Actions and many other CI/CD tools allow that. This allows you to avoid doing a bunch of copy-pasting.

inputs:
  project:
    required: true
    type: string

This is a variable we’re expected to provide when calling this workflow.

The second thing for us is to define the jobs we want to execute. We can have multiple jobs. Right now, our steps are the following:

  • Checkout the code
  • Get Docker Build environment
  • Build Docker image
jobs:
  docker-ci:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build Docker
        uses: docker/build-push-action@v4
        with:
          context: ./src/$
          load: true
          cache-from: type=gha
          cache-to: type=gha,mode=max
          tags: $:$
  • runs-on specified the platform we’re running our CI job in. The list of platforms is available in the documentation
  • steps is our list of actions we need to perform (it could be bash scrips or something GitHub calls “actions” - packaged scripts so you don’t have to write them yourself)
  • We’re checking out the code via actions/checkout@v4 which is available via this link
  • We’re setting up Docker BuildX docker/setup-buildx-action@v3 an environment that allows us to build docker images for various platforms as well as efficiently utilize docker cache (available via this link)
  • We’re performing the actual build via docker/build-push-action@v4 action (available via this link) with a few parameters we pass through

Parameters are:

  • context: ./src/$ which is a path to Docker context, and here we’re using the “project” variable we defined earlier
  • load: true meaning we won’t push this image anywhere since we have no docker registry to push it to
  • cache-from: type=gha tells us about the source of the docker cache, which is GitHub Actions (gha)
  • cache-to: type=gha,mode=max meaning we set to use gha cache, and we’re saying, “cache everything you can.”
  • tags: $:$ this is how we construct a docker image name and a tag. The tag is a “version”. You can either do a numbered version, but I prefer the “commit SHA” since it’s very descriptive because you know exactly exactly what commit to look for.

Now, do you remember our C# Service? Here’s how we dockerized it

One important thing is: that the docker context is ./src/cartservice/src where in all other projects, we have ./src/PROJECT_NAME we have to account for that in our docker image tag. So, let’s split the name by / so we can get just cartservice

We’re adding a step before checkout:

- name: Split Name
  id: split
  env:
    PATH_CANDIDATE: $
  run: echo "::set-output name=imagename::${PATH_CANDIDATE##*/}"

As you can see, we don’t use any actions, and running this natively on our ubuntu-latest runner.

The algorithm is straightforward:

  • We set an input into an environment variable PATH_CANDIDATE
  • We then split it by / and assign the first element to the output variable imagename
  • ::set-output is a GitHub actions-specific command. Other CI systems have their own equivalent.

So, for adservice it will be just adservice but for cartservice/src it’ll be cartservice

OK, we then need to modify our tag to use that imagename

tags: $:$

The reference is steps.STEP_ID.outputs.OUTPUTS_NAME

Now, it’s time to add the docker scanning step. There are multiple tools for this, such as Snyk, Trivy, or AWS/GCP/Azure native Docker image security scanning services. Since we’re not in a cloud step yet, we’ll be using Trivy. They provide free action you can use in your pipeline (here it is) that we can add as a step in our job:

- name: Scan for Vulnerabilities
  uses: aquasecurity/trivy-action@master
  id: scan
  with:
    image-ref: '$:$'
    exit-code: 1
    output: 'vulnerabilities.table'
    ignore-unfixed: true
    severity: 'CRITICAL,HIGH'

Here’s an explanation of the parameters:

  • image-ref - the same as tags
  • output: 'vulnerabilities.table' puts the results of vulnerability scans into a file so we can use it later on
  • exit-code - By default, it is 0, but we want to fail a pipeline if there are security issues. You might not always want to do that, but if you don’t think about security, no one will
  • ignore-unfixed: true it doesn’t make sense to stop the build if there are vulnerabilities which don’t have a fix yet
  • severity: 'CRITICAL,HIGH' we want to check only CRITICAL and HIGH-level issues

If we have vulnerabilities in our docker images, this step will fail, and we need to process the output. For this, we’ll be using Github Script action managed by the GitHub team

It allows us to use JavaScript and GitHub API to do stuff. Our use case will be simple

  • if a scan fails
  • get the results from vulnerabilities.table file and push its content as a comment

You don’t have to push it as a comment. It’s just my preferred method. I like to have all information as comments so devs, and I don’t have to navigate between different tabs when we don’t really have to. Plus, it provides an excellent audit trail.

Another way to do this is to post the results on GitHub Security. I prefer comments because they are more descriptive.

Here’s what our step looks like

- uses: actions/github-script@v7
  if: $
  with:
    script: |
      const { readFileSync } = require('fs')

      const text = readFileSync('./vulnerabilities.table');

      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: "👋 We found the following vulnerabilities: \n\n\n```\n" + text + "\n```"
      })

Explanation of the properties:

  • if: $ means this step will be triggered only if the scan step fails
  • script is a JS content
const { readFileSync } = require('fs')

const text = readFileSync('./vulnerabilities.table');

github.rest.issues.createComment({
  issue_number: context.issue.number,
  owner: context.repo.owner,
  repo: context.repo.repo,
  body: "👋 We found the following vulnerabilities: \n\n\n```\n" + text + "\n```"
 })

Firstly we read our file (it’s available because we share the same runner for the whole workflow.

  • Then, we use GitHub API to post a comment. PR is an issue in the GitHub terminology.
  • \n is a new line, and "``”` wraps what we want into a markdown.
```\n" + text + "\n```

For the comments to work, let’s navigate into the repositories settings -> Actions -> General. Scroll to the bottom and select the settings like in the image.

Settings for GitHub Actions to work and Post Comments to PRs

Let’s add another job that would lint a Dockerfile.

I love Hadolint - a very simple yet powerful docker linter. It catches common mistakes and allows you to write better docker files.

docker-lint:
  runs-on: ubuntu-latest
  steps:
    - name: Checkout
      uses: actions/checkout@v4
    - uses: hadolint/hadolint-action@v3.1.0
      id: scan
      with:
        dockerfile: ./src/$/Dockerfile
        failure-threshold: error
        output-file: dockerfile.table
    - name: Comment on PR
      uses: actions/github-script@v7
      if: $
      with:
        script: |
          const { readFileSync } = require('fs')

          const text = readFileSync('./dockerfile.table');

          github.rest.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: "👋 We found the following Dockerfile errors: \n\n\n```\n" + text + "\n```"
          })
  • Here, we need to do another checkout.
  • Plus, we need to use hadolint action and pass a Dockerfile path.
  • failure-threshold: error says that we’ll fail the pipeline ONLY if there are errors. So, warnings and improvements will be ignored.
  • We don’t have to do anything to run the jobs in parallel because it’s a default behavior if you specify multiple jobs in a single workflow.
  • Then, we take the same output as last time and post the results.

Complete _docker-pr.yml

on:
  workflow_call:
    inputs:
      project:
        required: true
        type: string

jobs:
  docker-lint:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - uses: hadolint/hadolint-action@v3.1.0
        id: scan
        with:
          dockerfile: ./src/$/Dockerfile
          failure-threshold: error
          output-file: dockerfile.table
      - name: Comment on PR
        uses: actions/github-script@v7
        if: $
        with:
          script: |
            const { readFileSync } = require('fs')

            const text = readFileSync('./dockerfile.table');

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: "👋 We found the following Dockerfile errors: \n\n\n```\n" + text + "\n```"
            })
  docker-ci:
    runs-on: ubuntu-latest
    steps:
      - name: Split Name
        id: split
        env:
          PATH_CANDIDATE: $
        run: echo "::set-output name=imagename::${PATH_CANDIDATE##*/}"
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build Docker
        uses: docker/build-push-action@v4
        with:
          context: ./src/$
          load: true
          cache-from: type=gha
          cache-to: type=gha,mode=max
          tags: $:$
      - name: Scan for Vulnerabilities
        uses: aquasecurity/trivy-action@master
        id: scan
        with:
          image-ref: '$:$'
          exit-code: 1
          output: 'vulnerabilities.table'
          ignore-unfixed: true
          severity: 'CRITICAL,HIGH'
      - name: Comment on PR
        uses: actions/github-script@v7
        if: $
        with:
          script: |
            const { readFileSync } = require('fs')

            const text = readFileSync('./vulnerabilities.table');

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: "👋 We found the following vulnerabilities: \n\n\n```\n" + text + "\n```"
            })

Let’s build the services

Let’s create .github/workflows/adservice-pr.yml

name: "PR Ad Service"

on:
  pull_request:
    paths:
      - 'src/adservice/**'
      - '.github/workflows/**'
    branches:
      - main

jobs:
  docker-workflow:
    uses: ./.github/workflows/_docker-pr.yml
    with:
      project: adservice

We name the workflow, then we say it can be executed only on pull requests if files changed are in ./src/adservice and .github/workflows folders.

And we run them only when we raise a PR against our main branch. It’s important because the flow we stick to is:

  • Branch out of main
  • Raise PR against main
  • Release from main

It’s a straightforward branching strategy that allows us to avoid overcomplicated releases and do stuff quickly.

Our job uses the shared workflow we defined above and passes our project name there.

You can create other files for every project we have dockerized.

What about tests?

So, 3 projects have tests:

  • Cart Service (C#)
  • Shipping Service (Go)
  • Product Catalog Service (Go)

There’s not much need to create a generic step yet, so we can avoid complications and put the tests there explicitly. Starting with .github/workflows/cartservice-pr.yml

name: "PR Cart Service"

on:
  pull_request:
    paths:
      - 'src/cartservice/**'
      - '.github/workflows/**'
    branches:
      - main

jobs:
  docker-workflow:
    uses: ./.github/workflows/_docker-pr.yml
    with:
      project: cartservice/src
  tests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Dotnet
        uses: actions/setup-dotnet@v4
        env:
          DOTNET_INSTALL_DIR: "./.dotnet"
        with:
          dotnet-version: '8.0'
      - name: Run Tests
        run: |
          dotnet test src/cartservice/
  • We created another job, so we need to checkout the code again
  • then since it’s a clean environment, we need to install dotnet. There’s action available, and we’re installing the same version we specified in a Dockerfile.
  • Plus, we’re installing it “locally” to the project’s folder so we don’t pollute the GitHub actions environment.
  • Installing everything you need into a “local” folder is a good practice.
  • And then, we execute the tests.

Moving on to .github/workflows/shippingservice-pr.yml

name: "PR Shipping Service"

on:
  pull_request:
    paths:
      - 'src/shippingservice/**'
      - '.github/workflows/**'
    branches:
      - main

jobs:
  docker-workflow:
    uses: ./.github/workflows/_docker-pr.yml
    with:
      project: shippingservice
  tests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.21'
      - name: Run Tests
        run: |
          cd ./src/shippingservice
          go test

Where we do the same thing we’ve done with dotnet, just using Go.

And a similar story with .github/workflows/productcatalogservice-pr.yml

name: "PR Product Catalog Service"

on:
  pull_request:
    paths:
      - 'src/productcatalogservice/**'
      - '.github/workflows/**'
    branches:
      - main

jobs:
  docker-workflow:
    uses: ./.github/workflows/_docker-pr.yml
    with:
      project: productcatalogservice
  tests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.21'
      - name: Run Tests
        run: |
          cd ./src/productcatalogservice
          go test

All other .github/workflows/*-pr.yml files look like .github/workflows/adservice-pr.yml

Now, let’s switch to our test branch and raise a PR

git checkout -b feature/bring-ci
git add .
git commit -m "shared CI & workflows for every project"
git push -u origin feature/bring-ci

When you go to raise a PR, you have to switch the branch because, by default, it’s going to Google’s repository (don’t spam them).

remote: Create a pull request for 'feature/bring-ci' on GitHub by visiting:
remote:      https://github.com/snegas/microservices-demo/pull/new/feature/bring-ci

Don't spam Google Microservices Demo Repository

  • Select your repo and hit Create Pull Request.
  • Then wait. You’ll see all your workflows triggered. For the first time.
  • You’ll get 2 (at the time of writing) messages on vulnerabilities for currencyservice and paymentservice.

Vulnerability in the Currency Service

And below, you’ll see:

All-in-one Workflows View

Let’s go to the PR Currency Service workflow:

Failed Workflow

Let’s hit Summary and see how steps are running in parallel

Summary

  • Navigate to the Usage to see exciting statistics on the billable time.
  • Go to the docker-lint job to see how it skipped Comment on PR step.
  • Now, let’s go to the PR Cart Service workflow and navigate to the Summary

PR Cart Service Workflow Summary

You can see how all are executed in parallel. Go through all other workflows to see what it looks like.

Let’s merge this PR despite the errors, and we’ll fix mistakes one by one to show how it would look in the real project.

PR in my fork from screenshots - Link

You can see that I did a dummy change there, last PR had too many commits because I experimented with a few things. Please feel free to take a look at history. Nothing gets done from the very first try. Embrace it!

Fix Currency Service Error

Let’s create a separate branch for it

git checkout main
git pull
git checkout -b fix/resolve-currencyservice-vulnerability

So, vulnerability says:

currencyservice:39ca357740a91263d531ac689e2baa378c91183b (alpine 3.19.1)
========================================================================
Total: 0 (HIGH: 0, CRITICAL: 0)


Node.js (node-pkg)
==================
Total: 1 (HIGH: 0, CRITICAL: 1)

┌───────────────────────────┬────────────────┬──────────┬────────┬───────────────────┬───────────────┬───────────────────────────────────────────────────────┐
│          Library          │ Vulnerability  │ Severity │ Status │ Installed Version │ Fixed Version │                         Title                         │
├───────────────────────────┼────────────────┼──────────┼────────┼───────────────────┼───────────────┼───────────────────────────────────────────────────────┤
│ protobufjs (package.json) │ CVE-2023-36665 │ CRITICAL │ fixed  │ 7.1.2             │ 7.2.4, 6.11.4 │ protobufjs: prototype pollution using user-controlled │
│                           │                │          │        │                   │               │ protobuf message                                      │
│                           │                │          │        │                   │               │ https://avd.aquasec.com/nvd/cve-2023-36665            │
└───────────────────────────┴────────────────┴──────────┴────────┴───────────────────┴───────────────┴───────────────────────────────────────────────────────┘ 

You can read about it here - CVE-2023-36665

You can find the package-lock.json that this version is a dependency for google-gax

google-gax NodeJS library dependencies

Let’s see if we can update our dependencies easily.

If you have node installed locally, you can simply do npm i and will show the following

5 vulnerabilities (3 high, 2 critical)

To address all issues, run:
  npm audit fix

Run `npm audit` for details.

And then npm audit fix

added 4 packages, removed 1 package, changed 16 packages, and audited 344 packages in 4s

If you don’t have Node, you can use Docker

docker run --rm -it -v ./src/currencyservice:/app --entrypoint=sh node:lts-alpine

we need to run the following inside of this container to run npm i

apk add --no-cache python3 make g++
cd /app
npm i
npm audit fix

Explained in detail here - Docker Basics: NodeJS Let’s see what we got updated, among other things

Fixed Vulnerability

That’s exactly what we needed. Let’s create a PR for it. Our changeset has to include just package-lock.json file.

git status
On branch fix/resolve-currencyservice-vulnerability
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   src/currencyservice/package-lock.json

no changes added to commit (use "git add" and/or "git commit -a")

Let’s create our PR (again, don’t forget to create it against your main branch and not Google Microservices)

git add .
git commit -m "Fix Currency Service Vulnerabilities"
git push -u origin fix/resolve-currencyservice-vulnerability

Example of PR in my fork - Link

Fixed vulnerabilities and not comments in the PR

OK, you can do the same with the paymentservice , just don’t forget to pull the changes after you merge this PR

git checkout main
git pull
git checkout -b fix/resolve-paymentservice-vulnerability

And do the same steps :)

You’ll see that the workflow was not triggered. So, I had to fix a typo and it triggered all workflows :)

PR in my fork - Link

The good thing is, you’ll get everything green and nice. It’s ready to merge!

Releases and Artifacts

So, CI is fun, but we need to do a few more things for it to be production-ready.

  • Build on push to main
  • Create releases & and artifacts

Build on main

Let’s create another branch and we’ll create a new template file .github/workflows/_docker-main.yml

on:
  workflow_call:
    inputs:
      project:
        required: true
        type: string

That will be the start. Because we want it to be generic enough. Let’s also copy docker-ci the job and modify it a bit

env:
  REGISTRY: ghcr.io
  IMAGE_NAME_PREFIX: $

jobs:
  docker-main:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - name: Split Name
        id: split
        env:
          PATH_CANDIDATE: $
        run: echo "::set-output name=imagename::${PATH_CANDIDATE##*/}"
      - name: Checkout
        uses: actions/checkout@v4
      - name: Login to GHCR
        uses: docker/login-action@v3
        with:
          registry: $
          username: $
          password: $
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build Docker
        uses: docker/build-push-action@v4
        with:
          context: ./src/$
          push: true
          cache-from: type=gha
          cache-to: type=gha,mode=max
          tags: $/$-$:$

First of all

REGISTRY: ghcr.io
IMAGE_NAME_PREFIX: $
  • ghcr.io is a GitHub docker registry
  • IMAGE_NAME_PREFIX environment variable to be a prefix to your image. They’ll be scoped to your user/organization so no one else can access them.

Then, we modified our job a bit to do 3 things:

  • Have a token with a scope to publish packages
  • Login to that registry via docker/login-action and we pass a registry, a username (our login), and a password (GITHUB_TOKEN)
  • In docker/build-push-action@v4 instead of load: true we do push: true and we modified tags to include our prefix

Now, let’s add all of the *-main.yml files (example adservice-main.yml)

name: "Main Ad Service"

on:
  push:
    paths:
      - 'src/adservice/**'
      - '.github/workflows/**'
    branches:
      - main
jobs:
  docker-workflow:
    uses: ./.github/workflows/_docker-main.yml
    with:
      project: adservice

You do the same with all the other services. and send PR (don’t forget to switch to your repo there so you don’t push to Google Microservices)

Link to my PR - Link

I did a small test with the adservice-main.yml

Just do

on:
  push:
    paths:
      - 'src/adservice/**'
      - '.github/workflows/**'

without

branches:
  - main

And do a quick push. You’ll be able to see the build!

Then, bring it back, and it’s ready for a merge! I had a few rounds of typos, which is OK.

After the merge, you’ll get your builds passed and on the main page, you’ll see the following

Packages!

Release

It’s time to create a release! Let’s hit create a new Release!

Release Notes (very basic)

Conclusion

So, we got the production-grade Github Actions CI configured. We hit a few bumps, everything is recorded in Git History :)

Try yourself, you get a bunch of Free Github Actions minutes with Public repos.

Key takeaways:

  • Do templates when it makes sense
  • Split main & pr builds
  • Don’t run EVERYTHING on PR builds
  • Use comments to communicate pipeline failures

The next step is to add some infrastructure flavor on top of this! Stay tuned for Terraform!