Finally! It is one of the world’s most popular platforms and programming languages.

This article continues my Docker Basics series, where I share helpful, practical information about everything docker-related.

We continue looking at the Google Microservices repository, and the next service in our list is the Ad Service.

Let’s navigate to ./src/adservice and explore the Java world.

Java has been around for more than 28 years. Java has heavily influenced the way modern software development is done.

However, what’s interesting is how Java is different from other platforms.

Everything even remotely related to Java application build becomes a Maven or Gradle plugin in one way or another, and all Java-specific tooling is built around this concept. Adopting tools like Docker becomes challenging since a typical adoption path for Java differs from the other platforms.

This doesn’t mean we shouldn’t do this. After all, it’s just a skill issue.

Repo exploration

In the README.md it says

The Ad service uses gradlew to compile/install/distribute. Gradle wrapper is already part of the source code. To build Ad Service, run:

./gradlew installDist

It will create executable script src/adservice/build/install/hipstershop/bin/AdService

This means our project is using Gradle. A build tool for Java projects.

Also, it’s a standard practice to supply a project with a gradle wrapper ./gradlew that will download a required version of gradle by itself, so there’s no need to worry about it. And it’s committed into the repository. Again, a Java ecosystem thing. You won’t find this anywhere else.

As for the Java version, it’s better to ask what version of Java should be used. In the build.gradle there are few lines about it

tasks.withType(JavaCompile) {
    sourceCompatibility = JavaVersion.VERSION_19
    targetCompatibility = JavaVersion.VERSION_19
}
  • sourceCompatibility defines the language version of Java used in your source files
  • targetCompatibility defines the minimum JVM version your code should run on

In general, it’s not advised to use these flags because it targets a specific Java version and are considered legacy mechanism. But a lot of code written in Java is legacy, so no worries! Plus, we are not here to mess with the code but to dockerize it.

In a nutshell, this means we can use the latest version of Java.

To build Java we need JDK - Java Development Kit. To run it later, we need JRE - Java Runtime Environment.

There are multiple distributions of both:

  • amazoncorretto
  • eclipse-temurin
  • ibm-semeru-runtimes
  • ibmjava
  • sapmachine

Don’t use openjdk because it’s deprecated in favor of the above.

Let’s select amazoncorretto and see how it goes.

Now, let’s move to environment variables. We’ve had them in every project so far.

The Java way to get environment variables is

System.getenv("ENV_VAR_NAME")

Let’s search for it! (the actual search is System.getenv)

We see 3 results:

  • PORT
  • DISABLE_STATS
  • DISABLE_TRACING

Here’s the code related to the PORT environment variable

int port = Integer.parseInt(System.getenv().getOrDefault("PORT", "9555"));

The default value is 9555 unless you set something. Good. We won’t change it.

The next one is the DISABLE_STATS

 private static void initStats() {
    if (System.getenv("DISABLE_STATS") != null) {
      logger.info("Stats disabled.");
      return;
    }
    logger.info("Stats enabled, but temporarily unavailable");

    long sleepTime = 10; /* seconds */
    int maxAttempts = 5;

    // TODO(arbrown) Implement OpenTelemetry stats

  }

The behavior is a bit similar to what we’ve seen in the other services. So we’ll set it to a default value so we can run on our local. Technically, this won’t do anything because it’s not implemented yet. However, we will forget about it in 6 months, and accidentally can break something if it’s not fully implemented.

And the same thing with DISABLE_TRACING

 private static void initTracing() {
    if (System.getenv("DISABLE_TRACING") != null) {
      logger.info("Tracing disabled.");
      return;
    }
    logger.info("Tracing enabled but temporarily unavailable");
    logger.info("See https://github.com/GoogleCloudPlatform/microservices-demo/issues/422 for more info.");

    // TODO(arbrown) Implement OpenTelemetry tracing
    
    logger.info("Tracing enabled - Stackdriver exporter initialized.");
  }

Gradle specifics

Gradle works via plugins. The plugins and project dependencies are listed in build.gradle which is a configuration file.

You can read the details here - Building Java & JVM projects

Essentially, Gradle runs tasks. Tasks can be addded via plugins. There are default tasks as well.

So, we need to perform our usual

  • download and cache dependencies
  • copy the code
  • build the code
  • Copy artifacts to the resulting image

Assmebling a Dockerfile

Let’s use the latest Amazon Corretto docker image and copy gradle-related stuff.

FROM amazoncorretto:21

WORKDIR /app

COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help

COPY *.gradle ./

OK. This is the first time we have used this method. Let me explain what’s going on.

  • COPY gradle gradle copies gradle wrapper’s files (this is essentially needed to download gradle and execute gradle commands), and you won’t change it unless you want to upgrade gradle
  • COPY gradlew* ./ copies a shell script that interacts with gradle-wrapper.jar we copied earlier. The reason we have it as 2 separate commands is because we want to copy the content of the gradle folder into the gradle folder instead of ./ RUN chmod +x ./gradlew && ./gradlew --help makes the ./gradlew executable (via chmod +x) and invokes a dummy command to pre-download gradle and cache it, so we don’t have to do it again in the upcoming commands and the rebuilds in the future
  • COPY *.gradle ./ copies the rest of the Gradle files

This allows us to change dependencies in the build.gradle file in the future, and still use the cached gradle as you’d typically do on your local during the regular development.

We would only need to redownload gradle when we upgrade it.

Gradle tasks

Let’s run the build locally and start a container to explore what’s available

docker build -t adservice .
...
docker run --rm -it adservice bash

Then, let’s execute the help command from the Dockerfile to see what options are available:

bash-4.2# ./gradlew --help

To see help contextual to the project, use gradlew help

USAGE: gradlew [option...] [task...]

-?, -h, --help                     Shows this help message.
-a, --no-rebuild                   Do not rebuild project dependencies.
-b, --build-file                   Specify the build file. [deprecated]
--build-cache                      Enables the Gradle build cache. Gradle will try to reuse outputs from previous builds.
--no-build-cache                   Disables the Gradle build cache.
-c, --settings-file                Specify the settings file. [deprecated]
--configuration-cache              Enables the configuration cache. Gradle will try to reuse the build configuration from previous builds.
--no-configuration-cache           Disables the configuration cache.
--configuration-cache-problems     Configures how the configuration cache handles problems (fail or warn). Defaults to fail.
--configure-on-demand              Configure necessary projects only. Gradle will attempt to reduce configuration time for large multi-project builds. [incubating]
--no-configure-on-demand           Disables the use of configuration on demand. [incubating]
--console                          Specifies which type of console output to generate. Values are 'plain', 'auto' (default), 'rich' or 'verbose'.
--continue                         Continue task execution after a task failure.
--no-continue                      Stop task execution after a task failure.
-D, --system-prop                  Set system property of the JVM (e.g. -Dmyprop=myvalue).
-d, --debug                        Log in debug mode (includes normal stacktrace).
--daemon                           Uses the Gradle daemon to run the build. Starts the daemon if not running.
--no-daemon                        Do not use the Gradle daemon to run the build. Useful occasionally if you have configured Gradle to always run with the daemon by default.
--export-keys                      Exports the public keys used for dependency verification.
-F, --dependency-verification      Configures the dependency verification mode. Values are 'strict', 'lenient' or 'off'.
--foreground                       Starts the Gradle daemon in the foreground.
-g, --gradle-user-home             Specifies the Gradle user home directory. Defaults to ~/.gradle
-I, --init-script                  Specify an initialization script.
-i, --info                         Set log level to info.
--include-build                    Include the specified build in the composite.
-M, --write-verification-metadata  Generates checksums for dependencies used in the project (comma-separated list)
-m, --dry-run                      Run the builds with all task actions disabled.
--max-workers                      Configure the number of concurrent workers Gradle is allowed to use.
--offline                          Execute the build without accessing network resources.
-P, --project-prop                 Set project property for the build script (e.g. -Pmyprop=myvalue).
-p, --project-dir                  Specifies the start directory for Gradle. Defaults to current directory.
--parallel                         Build projects in parallel. Gradle will attempt to determine the optimal number of executor threads to use.
--no-parallel                      Disables parallel execution to build projects.
--priority                         Specifies the scheduling priority for the Gradle daemon and all processes launched by it. Values are 'normal' (default) or 'low'
--profile                          Profile build execution time and generates a report in the <build_dir>/reports/profile directory.
--project-cache-dir                Specify the project-specific cache directory. Defaults to .gradle in the root project directory.
-q, --quiet                        Log errors only.
--refresh-keys                     Refresh the public keys used for dependency verification.
--rerun-tasks                      Ignore previously cached task results.
-S, --full-stacktrace              Print out the full (very verbose) stacktrace for all exceptions.
-s, --stacktrace                   Print out the stacktrace for all exceptions.
--scan                             Creates a build scan. Gradle will emit a warning if the build scan plugin has not been applied. (https://gradle.com/build-scans)
--no-scan                          Disables the creation of a build scan. For more information about build scans, please visit https://gradle.com/build-scans.
--status                           Shows status of running and recently stopped Gradle daemon(s).
--stop                             Stops the Gradle daemon if it is running.
-t, --continuous                   Enables continuous build. Gradle does not exit and will re-execute tasks when task file inputs change.
-U, --refresh-dependencies         Refresh the state of dependencies.
--update-locks                     Perform a partial update of the dependency lock, letting passed in module notations change version. [incubating]
-V, --show-version                 Print version info and continue.
-v, --version                      Print version info and exit.
-w, --warn                         Set log level to warn.
--warning-mode                     Specifies which mode of warnings to generate. Values are 'all', 'fail', 'summary'(default) or 'none'
--watch-fs                         Enables watching the file system for changes, allowing data about the file system to be re-used for the next build.
--no-watch-fs                      Disables watching the file system.
--write-locks                      Persists dependency resolution for locked configurations, ignoring existing locking information if it exists
-x, --exclude-task                 Specify a task to be excluded from execution.
--                                 Signals the end of built-in options. Gradle parses subsequent parameters as only tasks or task options.

I guess the most valuable options for us will be

  • --no-daemon since we don’t want Gradle running in the background (since there’s no background in Docker)
  • --no-rebuild because we don’t need to “re-fetch” dependencies during the application build (we’ll cache them earlier)

The following command to run will be to see what tasks are available (again, running it inside a docker container for debug purposes).

bash-4.2# ./gradlew tasks
Starting a Gradle Daemon (subsequent builds will be faster)

> Task :tasks

------------------------------------------------------------
Tasks runnable from root project 'hipstershop' - Ad Service
------------------------------------------------------------

Application tasks
-----------------
run - Runs this project as a JVM application

Build tasks
-----------
assemble - Assembles the outputs of this project.
build - Assembles and tests this project.
buildDependents - Assembles and tests this project and all projects that depend on it.
buildNeeded - Assembles and tests this project and all projects it depends on.
classes - Assembles main classes.
clean - Deletes the build directory.
jar - Assembles a jar archive containing the classes of the 'main' feature.
testClasses - Assembles test classes.

Build Setup tasks
-----------------
init - Initializes a new Gradle build.
wrapper - Generates Gradle wrapper files.

Distribution tasks
------------------
assembleDist - Assembles the main distributions
distTar - Bundles the project as a distribution.
distZip - Bundles the project as a distribution.
installDist - Installs the project as a distribution as-is.

Documentation tasks
-------------------
javadoc - Generates Javadoc API documentation for the 'main' feature.

Help tasks
----------
buildEnvironment - Displays all buildscript dependencies declared in root project 'hipstershop'.
dependencies - Displays all dependencies declared in root project 'hipstershop'.
dependencyInsight - Displays the insight into a specific dependency in root project 'hipstershop'.
help - Displays a help message.
javaToolchains - Displays the detected java toolchains.
outgoingVariants - Displays the outgoing variants of root project 'hipstershop'.
projects - Displays the sub-projects of root project 'hipstershop'.
properties - Displays the properties of root project 'hipstershop'.
resolvableConfigurations - Displays the configurations that can be resolved in root project 'hipstershop'.
tasks - Displays the tasks runnable from root project 'hipstershop'.

IDE tasks
---------
cleanIdea - Cleans IDEA project files (IML, IPR)
idea - Generates IDEA project files (IML, IPR, IWS)
openIdea - Opens the IDEA project

Verification tasks
------------------
check - Runs all checks.
test - Runs the test suite.

Rules
-----
Pattern: clean<TaskName>: Cleans the output files of a task.
Pattern: build<ConfigurationName>: Assembles the artifacts of a configuration.

The valuable commands for us are:

  • downloadRepos (you won’t find it in the output, but in build.gradle)
// This to cache dependencies during Docker image building. First build will take time.
// Subsequent build will be incremental.
task downloadRepos(type: Copy) {
    from configurations.compileClasspath
    into offlineCompile
    from configurations.compileClasspath
    into offlineCompile
}
  • installDist (it’s here in the output). It’s not a traditional way to pack Java applications via Gradle. You’d typically run ./gradlew assemble or ./gradlew build. The reason why we need to use it is in build.gradle
plugins {
    id 'com.google.protobuf' version '0.9.4'
    id 'com.github.sherter.google-java-format' version '0.9'
    id 'idea'
    id 'application'
}

...

task adService(type: CreateStartScripts) {
    mainClass.set('hipstershop.AdService')
    applicationName = 'AdService'
    outputDir = new File(project.buildDir, 'tmp')
    classpath = startScripts.classpath
    defaultJvmOpts =
             ["-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=adservice,-cprof_service_version=1.0.0"]
}

task adServiceClient(type: CreateStartScripts) {
    mainClass.set('hipstershop.AdServiceClient')
    applicationName = 'AdServiceClient'
    outputDir = new File(project.buildDir, 'tmp')
    classpath = startScripts.classpath
    defaultJvmOpts =
             ["-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=adserviceclient,-cprof_service_version=1.0.0"]
}

applicationDistribution.into('bin') {
    from(adService)
    from(adServiceClient)
    fileMode = 0755
}

The Gradle configuration uses a plugin called application. And at the end of the file, you see applicationDistribution.into

The application plugin adds a bunch of tasks. One of them is InstallDist which installs our app into a specific folder we set. Later, we specify that folder relative to our project folder via applicationDistribution

You can read about it here - The Application Plugin

This will affect our Dockerfile, which should look like this:

FROM amazoncorretto:21

WORKDIR /app

COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help

COPY *.gradle ./

RUN ./gradlew downloadRepos --no-daemon

COPY . .

RUN chmod +x ./gradlew

RUN ./gradlew installDist --no-daemon --no-rebuild

ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1

EXPOSE 9555

ENTRYPOINT [ "./bin/AdService" ]
  • ./gradlew downloadRepos --no-daemon will download dependencies and use cache them in a Docker layer. So, this command will only be executed when we update either build.gradle or/and settings.gradle
  • COPY . . copies all the code, and this overwrites ./gradlew we already have because of chmod +x in the beginning, messing up with file permissions, thus making it different.
  • That’s why we need the second chmod +x ./gradlew
  • And in the end, we do RUN ./gradlew installDist --no-daemon --no-rebuild to build our artifact correctly.
  • We sort out ENV and EXPOSE at the beginning of the article while exploring the project.
  • In the entry point, we set ./bin/AdService. We got that path from build.gradle
...
applicationName = 'AdService'
...
applicationDistribution.into('bin') {
...

Let’s try to build this and run.

docker build -t adservice .
...

docker run --rm -it -p 9555:9555 adservice

And we expectedly get the error:

Error occurred during initialization of VM
Could not find agent library /opt/cprof/profiler_java_agent.so in absolute path, with error: /opt/cprof/profiler_java_agent.so: cannot open shared object file: No such file or directory

Profiler

While exploring build.gradle you can find this line

"-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=adservice,-cprof_service_version=1.0.0"

It adds a 3rd-party as a dependency for our application runtime. So we need to download it early.

Specifically, this dependency helps with profiling - figuring out how long things take (like a call to another service). Google maintains it, and the doc is here

So, the modified Dockerfile looks like it

FROM amazoncorretto:21

WORKDIR /app

RUN yum -y update && yum -y install gzip tar wget && mkdir -p /opt/cprof && \
    wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent.tar.gz \
    | tar xzv -C /opt/cprof && \
    rm -rf profiler_java_agent.tar.gz

COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help

COPY *.gradle ./

RUN ./gradlew downloadRepos --no-daemon

COPY . .

RUN chmod +x ./gradlew

RUN ./gradlew installDist --no-daemon --no-rebuild

ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1

EXPOSE 9555

ENTRYPOINT [ "./bin/AdService" ]

amazoncorretto:21 is based on AmazonLinux, thus using yum the package manager. Plus, it’s using a small version of AmazonLinux, which doesn’t have tar, wget, or gzip. So we’re adding those.

We’ve done almost the same with Python in Docker Basics: Python

We also put it in the beginning so it won’t be executed every time.

Now, if we do the build and run, it starts successfully.

MAC USERS: The docker image has to be built and run for --platform=linux/amd64 via

docker build --platform=linux/amd64 -t adservice .

docker run --rm -it --platform=linux/amd64 -p 9555:9555 adservice

The resulting image is around 1.1GB, which is a lot for such a small codebase!

Let’s improve it!

Multistage

We don’t want to run our app with all the mess we had to install to build the app.

And we don’t need all the source code. We just need the artifact and the profiler.

So, our Dockerfile with multistage will end up looking like this:

FROM amazoncorretto:21 as build

WORKDIR /app

RUN yum -y update && yum -y install gzip tar wget && mkdir -p /opt/cprof && \
    wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent.tar.gz \
    | tar xzv -C /opt/cprof && \
    rm -rf profiler_java_agent.tar.gz

COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help

COPY *.gradle ./

RUN ./gradlew downloadRepos --no-daemon

COPY . .

RUN chmod +x ./gradlew

RUN ./gradlew installDist --no-daemon --no-rebuild

FROM amazoncorretto:21

COPY --from=build --chown=1000:1000 /opt/cprof /opt/cprof

WORKDIR /app

COPY --from=build /app/build/install/hipstershop ./

ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1

EXPOSE 9555

ENTRYPOINT [ "./bin/AdService" ]

The notable difference is --chown=root:root which we want to do to avoid permissions clashing. You might need this when you copy files you downloaded in the previous stage.

This gives us a 521MB docker image!

We can do better with Alpine! This requires us to use different profiler_java_agent as per Google Documentation, and we end up with a few minor changes:

FROM amazoncorretto:21 as build

WORKDIR /app

RUN yum -y update && yum -y install gzip tar wget && mkdir -p /opt/cprof && \
    wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent_alpine.tar.gz \
    | tar xzv -C /opt/cprof && \
    rm -rf profiler_java_agent.tar.gz

COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help

COPY *.gradle ./

RUN ./gradlew downloadRepos --no-daemon

COPY . .

RUN chmod +x ./gradlew

RUN ./gradlew installDist --no-daemon --no-rebuild

FROM amazoncorretto:21-alpine-jdk

COPY --from=build --chown=1000:1000 /opt/cprof /opt/cprof

WORKDIR /app

COPY --from=build /app/build/install/hipstershop ./

ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1

EXPOSE 9555

ENTRYPOINT [ "./bin/AdService" ]

And that’s it! amazoncorretto:21-alpine-jdk Is the smallest image we can use.

The resulting image is 357MB! It’s not as small as the other images, but it’s Java. Could you let me know what you expected?

Conclusion

Java and Docker have weird dynamics. Java ecosystem has been around much longer. Whenever you’re using Docker, you must go deep into the Java ecosystem to figure out what’s happening and why.

Still, we can use modern ways and develop efficient Dockerfiles!

Try to follow the article and see the errors we face yourself. Try another base image such as eclipse-temurin

Feel free to ask questions and post your feedback.

The next articles will be about docker-compose