Docker Basics: Java
Finally! It is one of the world’s most popular platforms and programming languages.
This article continues my Docker Basics series, where I share helpful, practical information about everything docker-related.
We continue looking at the Google Microservices repository, and the next service in our list is the Ad Service.
Let’s navigate to ./src/adservice
and explore the Java world.
Java has been around for more than 28 years. Java has heavily influenced the way modern software development is done.
However, what’s interesting is how Java is different from other platforms.
Everything even remotely related to Java application build becomes a Maven or Gradle plugin in one way or another, and all Java-specific tooling is built around this concept. Adopting tools like Docker becomes challenging since a typical adoption path for Java differs from the other platforms.
This doesn’t mean we shouldn’t do this. After all, it’s just a skill issue.
Repo exploration
In the README.md
it says
The Ad service uses gradlew to compile/install/distribute. Gradle wrapper is already part of the source code. To build Ad Service, run:
./gradlew installDist
It will create executable script src/adservice/build/install/hipstershop/bin/AdService
This means our project is using Gradle. A build tool for Java projects.
Also, it’s a standard practice to supply a project with a gradle wrapper ./gradlew
that will download a required version of gradle by itself, so there’s no need to worry about it. And it’s committed into the repository. Again, a Java ecosystem thing. You won’t find this anywhere else.
As for the Java version, it’s better to ask what version of Java should be used. In the build.gradle
there are few lines about it
tasks.withType(JavaCompile) {
sourceCompatibility = JavaVersion.VERSION_19
targetCompatibility = JavaVersion.VERSION_19
}
sourceCompatibility
defines the language version of Java used in your source filestargetCompatibility
defines the minimum JVM version your code should run on
In general, it’s not advised to use these flags because it targets a specific Java version and are considered legacy mechanism. But a lot of code written in Java is legacy, so no worries! Plus, we are not here to mess with the code but to dockerize it.
In a nutshell, this means we can use the latest version of Java.
To build Java we need JDK - Java Development Kit. To run it later, we need JRE - Java Runtime Environment.
There are multiple distributions of both:
amazoncorretto
eclipse-temurin
ibm-semeru-runtimes
ibmjava
sapmachine
Don’t use openjdk because it’s deprecated in favor of the above.
Let’s select amazoncorretto and see how it goes.
Now, let’s move to environment variables. We’ve had them in every project so far.
The Java way to get environment variables is
System.getenv("ENV_VAR_NAME")
Let’s search for it! (the actual search is System.getenv
)
We see 3 results:
PORT
DISABLE_STATS
DISABLE_TRACING
Here’s the code related to the PORT
environment variable
int port = Integer.parseInt(System.getenv().getOrDefault("PORT", "9555"));
The default value is 9555
unless you set something. Good. We won’t change it.
The next one is the DISABLE_STATS
private static void initStats() {
if (System.getenv("DISABLE_STATS") != null) {
logger.info("Stats disabled.");
return;
}
logger.info("Stats enabled, but temporarily unavailable");
long sleepTime = 10; /* seconds */
int maxAttempts = 5;
// TODO(arbrown) Implement OpenTelemetry stats
}
The behavior is a bit similar to what we’ve seen in the other services. So we’ll set it to a default value so we can run on our local. Technically, this won’t do anything because it’s not implemented yet. However, we will forget about it in 6 months, and accidentally can break something if it’s not fully implemented.
And the same thing with DISABLE_TRACING
private static void initTracing() {
if (System.getenv("DISABLE_TRACING") != null) {
logger.info("Tracing disabled.");
return;
}
logger.info("Tracing enabled but temporarily unavailable");
logger.info("See https://github.com/GoogleCloudPlatform/microservices-demo/issues/422 for more info.");
// TODO(arbrown) Implement OpenTelemetry tracing
logger.info("Tracing enabled - Stackdriver exporter initialized.");
}
Gradle specifics
Gradle works via plugins. The plugins and project dependencies are listed in build.gradle
which is a configuration file.
You can read the details here - Building Java & JVM projects
Essentially, Gradle runs tasks. Tasks can be addded via plugins. There are default tasks as well.
So, we need to perform our usual
- download and cache dependencies
- copy the code
- build the code
- Copy artifacts to the resulting image
Assmebling a Dockerfile
Let’s use the latest Amazon Corretto docker image and copy gradle-related stuff.
FROM amazoncorretto:21
WORKDIR /app
COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help
COPY *.gradle ./
OK. This is the first time we have used this method. Let me explain what’s going on.
COPY gradle gradle
copies gradle wrapper’s files (this is essentially needed to download gradle and execute gradle commands), and you won’t change it unless you want to upgrade gradleCOPY gradlew* ./
copies a shell script that interacts withgradle-wrapper.jar
we copied earlier. The reason we have it as 2 separate commands is because we want to copy the content of the gradle folder into the gradle folder instead of./
RUN chmod +x ./gradlew && ./gradlew --help
makes the./gradlew
executable (viachmod +x
) and invokes a dummy command to pre-download gradle and cache it, so we don’t have to do it again in the upcoming commands and the rebuilds in the futureCOPY *.gradle ./
copies the rest of the Gradle files
This allows us to change dependencies in the build.gradle
file in the future, and still use the cached gradle as you’d typically do on your local during the regular development.
We would only need to redownload gradle when we upgrade it.
Gradle tasks
Let’s run the build locally and start a container to explore what’s available
docker build -t adservice .
...
docker run --rm -it adservice bash
Then, let’s execute the help command from the Dockerfile
to see what options are available:
bash-4.2# ./gradlew --help
To see help contextual to the project, use gradlew help
USAGE: gradlew [option...] [task...]
-?, -h, --help Shows this help message.
-a, --no-rebuild Do not rebuild project dependencies.
-b, --build-file Specify the build file. [deprecated]
--build-cache Enables the Gradle build cache. Gradle will try to reuse outputs from previous builds.
--no-build-cache Disables the Gradle build cache.
-c, --settings-file Specify the settings file. [deprecated]
--configuration-cache Enables the configuration cache. Gradle will try to reuse the build configuration from previous builds.
--no-configuration-cache Disables the configuration cache.
--configuration-cache-problems Configures how the configuration cache handles problems (fail or warn). Defaults to fail.
--configure-on-demand Configure necessary projects only. Gradle will attempt to reduce configuration time for large multi-project builds. [incubating]
--no-configure-on-demand Disables the use of configuration on demand. [incubating]
--console Specifies which type of console output to generate. Values are 'plain', 'auto' (default), 'rich' or 'verbose'.
--continue Continue task execution after a task failure.
--no-continue Stop task execution after a task failure.
-D, --system-prop Set system property of the JVM (e.g. -Dmyprop=myvalue).
-d, --debug Log in debug mode (includes normal stacktrace).
--daemon Uses the Gradle daemon to run the build. Starts the daemon if not running.
--no-daemon Do not use the Gradle daemon to run the build. Useful occasionally if you have configured Gradle to always run with the daemon by default.
--export-keys Exports the public keys used for dependency verification.
-F, --dependency-verification Configures the dependency verification mode. Values are 'strict', 'lenient' or 'off'.
--foreground Starts the Gradle daemon in the foreground.
-g, --gradle-user-home Specifies the Gradle user home directory. Defaults to ~/.gradle
-I, --init-script Specify an initialization script.
-i, --info Set log level to info.
--include-build Include the specified build in the composite.
-M, --write-verification-metadata Generates checksums for dependencies used in the project (comma-separated list)
-m, --dry-run Run the builds with all task actions disabled.
--max-workers Configure the number of concurrent workers Gradle is allowed to use.
--offline Execute the build without accessing network resources.
-P, --project-prop Set project property for the build script (e.g. -Pmyprop=myvalue).
-p, --project-dir Specifies the start directory for Gradle. Defaults to current directory.
--parallel Build projects in parallel. Gradle will attempt to determine the optimal number of executor threads to use.
--no-parallel Disables parallel execution to build projects.
--priority Specifies the scheduling priority for the Gradle daemon and all processes launched by it. Values are 'normal' (default) or 'low'
--profile Profile build execution time and generates a report in the <build_dir>/reports/profile directory.
--project-cache-dir Specify the project-specific cache directory. Defaults to .gradle in the root project directory.
-q, --quiet Log errors only.
--refresh-keys Refresh the public keys used for dependency verification.
--rerun-tasks Ignore previously cached task results.
-S, --full-stacktrace Print out the full (very verbose) stacktrace for all exceptions.
-s, --stacktrace Print out the stacktrace for all exceptions.
--scan Creates a build scan. Gradle will emit a warning if the build scan plugin has not been applied. (https://gradle.com/build-scans)
--no-scan Disables the creation of a build scan. For more information about build scans, please visit https://gradle.com/build-scans.
--status Shows status of running and recently stopped Gradle daemon(s).
--stop Stops the Gradle daemon if it is running.
-t, --continuous Enables continuous build. Gradle does not exit and will re-execute tasks when task file inputs change.
-U, --refresh-dependencies Refresh the state of dependencies.
--update-locks Perform a partial update of the dependency lock, letting passed in module notations change version. [incubating]
-V, --show-version Print version info and continue.
-v, --version Print version info and exit.
-w, --warn Set log level to warn.
--warning-mode Specifies which mode of warnings to generate. Values are 'all', 'fail', 'summary'(default) or 'none'
--watch-fs Enables watching the file system for changes, allowing data about the file system to be re-used for the next build.
--no-watch-fs Disables watching the file system.
--write-locks Persists dependency resolution for locked configurations, ignoring existing locking information if it exists
-x, --exclude-task Specify a task to be excluded from execution.
-- Signals the end of built-in options. Gradle parses subsequent parameters as only tasks or task options.
I guess the most valuable options for us will be
--no-daemon
since we don’t want Gradle running in the background (since there’s no background in Docker)--no-rebuild
because we don’t need to “re-fetch” dependencies during the application build (we’ll cache them earlier)
The following command to run will be to see what tasks are available (again, running it inside a docker container for debug purposes).
bash-4.2# ./gradlew tasks
Starting a Gradle Daemon (subsequent builds will be faster)
> Task :tasks
------------------------------------------------------------
Tasks runnable from root project 'hipstershop' - Ad Service
------------------------------------------------------------
Application tasks
-----------------
run - Runs this project as a JVM application
Build tasks
-----------
assemble - Assembles the outputs of this project.
build - Assembles and tests this project.
buildDependents - Assembles and tests this project and all projects that depend on it.
buildNeeded - Assembles and tests this project and all projects it depends on.
classes - Assembles main classes.
clean - Deletes the build directory.
jar - Assembles a jar archive containing the classes of the 'main' feature.
testClasses - Assembles test classes.
Build Setup tasks
-----------------
init - Initializes a new Gradle build.
wrapper - Generates Gradle wrapper files.
Distribution tasks
------------------
assembleDist - Assembles the main distributions
distTar - Bundles the project as a distribution.
distZip - Bundles the project as a distribution.
installDist - Installs the project as a distribution as-is.
Documentation tasks
-------------------
javadoc - Generates Javadoc API documentation for the 'main' feature.
Help tasks
----------
buildEnvironment - Displays all buildscript dependencies declared in root project 'hipstershop'.
dependencies - Displays all dependencies declared in root project 'hipstershop'.
dependencyInsight - Displays the insight into a specific dependency in root project 'hipstershop'.
help - Displays a help message.
javaToolchains - Displays the detected java toolchains.
outgoingVariants - Displays the outgoing variants of root project 'hipstershop'.
projects - Displays the sub-projects of root project 'hipstershop'.
properties - Displays the properties of root project 'hipstershop'.
resolvableConfigurations - Displays the configurations that can be resolved in root project 'hipstershop'.
tasks - Displays the tasks runnable from root project 'hipstershop'.
IDE tasks
---------
cleanIdea - Cleans IDEA project files (IML, IPR)
idea - Generates IDEA project files (IML, IPR, IWS)
openIdea - Opens the IDEA project
Verification tasks
------------------
check - Runs all checks.
test - Runs the test suite.
Rules
-----
Pattern: clean<TaskName>: Cleans the output files of a task.
Pattern: build<ConfigurationName>: Assembles the artifacts of a configuration.
The valuable commands for us are:
downloadRepos
(you won’t find it in the output, but inbuild.gradle
)
// This to cache dependencies during Docker image building. First build will take time.
// Subsequent build will be incremental.
task downloadRepos(type: Copy) {
from configurations.compileClasspath
into offlineCompile
from configurations.compileClasspath
into offlineCompile
}
installDist
(it’s here in the output). It’s not a traditional way to pack Java applications via Gradle. You’d typically run./gradlew assemble
or./gradlew build
. The reason why we need to use it is inbuild.gradle
plugins {
id 'com.google.protobuf' version '0.9.4'
id 'com.github.sherter.google-java-format' version '0.9'
id 'idea'
id 'application'
}
...
task adService(type: CreateStartScripts) {
mainClass.set('hipstershop.AdService')
applicationName = 'AdService'
outputDir = new File(project.buildDir, 'tmp')
classpath = startScripts.classpath
defaultJvmOpts =
["-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=adservice,-cprof_service_version=1.0.0"]
}
task adServiceClient(type: CreateStartScripts) {
mainClass.set('hipstershop.AdServiceClient')
applicationName = 'AdServiceClient'
outputDir = new File(project.buildDir, 'tmp')
classpath = startScripts.classpath
defaultJvmOpts =
["-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=adserviceclient,-cprof_service_version=1.0.0"]
}
applicationDistribution.into('bin') {
from(adService)
from(adServiceClient)
fileMode = 0755
}
The Gradle configuration uses a plugin called application. And at the end of the file, you see applicationDistribution.into
The application plugin adds a bunch of tasks. One of them is InstallDist which installs our app into a specific folder we set. Later, we specify that folder relative to our project folder via applicationDistribution
You can read about it here - The Application Plugin
This will affect our Dockerfile
, which should look like this:
FROM amazoncorretto:21
WORKDIR /app
COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help
COPY *.gradle ./
RUN ./gradlew downloadRepos --no-daemon
COPY . .
RUN chmod +x ./gradlew
RUN ./gradlew installDist --no-daemon --no-rebuild
ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1
EXPOSE 9555
ENTRYPOINT [ "./bin/AdService" ]
./gradlew downloadRepos --no-daemon
will download dependencies and use cache them in a Docker layer. So, this command will only be executed when we update either build.gradle
or/andsettings.gradle
COPY . .
copies all the code, and this overwrites./gradlew
we already have because ofchmod +x
in the beginning, messing up with file permissions, thus making it different.- That’s why we need the second
chmod +x ./gradlew
- And in the end, we do
RUN ./gradlew installDist --no-daemon --no-rebuild
to build our artifact correctly. - We sort out
ENV
andEXPOSE
at the beginning of the article while exploring the project. - In the entry point, we set
./bin/AdService
. We got that path frombuild.gradle
...
applicationName = 'AdService'
...
applicationDistribution.into('bin') {
...
Let’s try to build this and run.
docker build -t adservice .
...
docker run --rm -it -p 9555:9555 adservice
And we expectedly get the error:
Error occurred during initialization of VM
Could not find agent library /opt/cprof/profiler_java_agent.so in absolute path, with error: /opt/cprof/profiler_java_agent.so: cannot open shared object file: No such file or directory
Profiler
While exploring build.gradle
you can find this line
"-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=adservice,-cprof_service_version=1.0.0"
It adds a 3rd-party as a dependency for our application runtime. So we need to download it early.
Specifically, this dependency helps with profiling - figuring out how long things take (like a call to another service). Google maintains it, and the doc is here
So, the modified Dockerfile
looks like it
FROM amazoncorretto:21
WORKDIR /app
RUN yum -y update && yum -y install gzip tar wget && mkdir -p /opt/cprof && \
wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent.tar.gz \
| tar xzv -C /opt/cprof && \
rm -rf profiler_java_agent.tar.gz
COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help
COPY *.gradle ./
RUN ./gradlew downloadRepos --no-daemon
COPY . .
RUN chmod +x ./gradlew
RUN ./gradlew installDist --no-daemon --no-rebuild
ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1
EXPOSE 9555
ENTRYPOINT [ "./bin/AdService" ]
amazoncorretto:21
is based on AmazonLinux, thus using yum
the package manager. Plus, it’s using a small version of AmazonLinux, which doesn’t have tar
, wget
, or gzip
. So we’re adding those.
We’ve done almost the same with Python in Docker Basics: Python
We also put it in the beginning so it won’t be executed every time.
Now, if we do the build and run, it starts successfully.
MAC USERS: The docker image has to be built and run for --platform=linux/amd64
via
docker build --platform=linux/amd64 -t adservice .
docker run --rm -it --platform=linux/amd64 -p 9555:9555 adservice
The resulting image is around 1.1GB
, which is a lot for such a small codebase!
Let’s improve it!
Multistage
We don’t want to run our app with all the mess we had to install to build the app.
And we don’t need all the source code. We just need the artifact and the profiler.
So, our Dockerfile
with multistage will end up looking like this:
FROM amazoncorretto:21 as build
WORKDIR /app
RUN yum -y update && yum -y install gzip tar wget && mkdir -p /opt/cprof && \
wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent.tar.gz \
| tar xzv -C /opt/cprof && \
rm -rf profiler_java_agent.tar.gz
COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help
COPY *.gradle ./
RUN ./gradlew downloadRepos --no-daemon
COPY . .
RUN chmod +x ./gradlew
RUN ./gradlew installDist --no-daemon --no-rebuild
FROM amazoncorretto:21
COPY --from=build --chown=1000:1000 /opt/cprof /opt/cprof
WORKDIR /app
COPY --from=build /app/build/install/hipstershop ./
ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1
EXPOSE 9555
ENTRYPOINT [ "./bin/AdService" ]
The notable difference is --chown=root:root
which we want to do to avoid permissions clashing. You might need this when you copy files you downloaded in the previous stage.
This gives us a 521MB docker image!
We can do better with Alpine! This requires us to use different profiler_java_agent
as per Google Documentation, and we end up with a few minor changes:
FROM amazoncorretto:21 as build
WORKDIR /app
RUN yum -y update && yum -y install gzip tar wget && mkdir -p /opt/cprof && \
wget -q -O- https://storage.googleapis.com/cloud-profiler/java/latest/profiler_java_agent_alpine.tar.gz \
| tar xzv -C /opt/cprof && \
rm -rf profiler_java_agent.tar.gz
COPY gradle gradle
COPY gradlew* ./
RUN chmod +x ./gradlew && ./gradlew --help
COPY *.gradle ./
RUN ./gradlew downloadRepos --no-daemon
COPY . .
RUN chmod +x ./gradlew
RUN ./gradlew installDist --no-daemon --no-rebuild
FROM amazoncorretto:21-alpine-jdk
COPY --from=build --chown=1000:1000 /opt/cprof /opt/cprof
WORKDIR /app
COPY --from=build /app/build/install/hipstershop ./
ENV DISABLE_STATS=1
ENV DISABLE_TRACING=1
EXPOSE 9555
ENTRYPOINT [ "./bin/AdService" ]
And that’s it! amazoncorretto:21-alpine-jdk
Is the smallest image we can use.
The resulting image is 357MB
! It’s not as small as the other images, but it’s Java. Could you let me know what you expected?
Conclusion
Java and Docker have weird dynamics. Java ecosystem has been around much longer. Whenever you’re using Docker, you must go deep into the Java ecosystem to figure out what’s happening and why.
Still, we can use modern ways and develop efficient Dockerfiles!
Try to follow the article and see the errors we face yourself. Try another base image such as eclipse-temurin
Feel free to ask questions and post your feedback.
The next articles will be about docker-compose