Set Up Your First Apache Beam Application (Java)

3 min readFeb 28, 2023

Overview

Data processing has become more and more important and there are so many tools for data processing jobs. Apache Beam supports both batch and streaming processing, both of which are important and necessary for many data-driven services.

For a complete beginner, it’s hard to start learning a new framework without an experience of setting up a simple “Hello World” application.

This post will cover how to create and run your first Apache Beam application in Java.

Tools

Gradle: build tool
VS Code for editor + Gradle for Java extension

Steps

1. Create a Gradle project

gradle init

You’ll be asked several questions. Here are the values that I used for my project.

Select type of project to generate: application
Select implementation language: java
Split functionality across multiple subprojects?: no
Select build script DSL: Kotlin
Generate build using new APIs and behavior: yes
Select test framework: JUnit Jupyter
Project name: apachebeamtraining
Source package: apachebeamtraining

You can choose your preferred option.

Now you can run the Hello World application with the following command:

./gradlew run
Starting a Gradle Daemon, 3 incompatible Daemons could not be reused, use --status for details

> Task :app:run
Hello World!

BUILD SUCCESSFUL in 7s
2 actionable tasks: 2 executed

2. Update dependencies (app/build.gradle.kts)

Add the apache beam related dependencies (you can replace the version with the latest one) to dependencies

    // https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core
    implementation("org.apache.beam:beam-sdks-java-core:2.45.0")

    // https://mvnrepository.com/artifact/org.apache.beam/beam-runners-direct-java
    runtimeOnly("org.apache.beam:beam-runners-direct-java:2.45.0")

3. Write the first Apache Beam logic

Write the wordcount Apache Beam logic in src/main/java/apachebeamtraining/App.java:

public class App {
    public static void main(String[] args) {
        PipelineOptions options = PipelineOptionsFactory.create();
        // Create pipeline
        Pipeline p = Pipeline.create(options);
        // Read text data from Sample.txt
        PCollection<String> textData = p.apply(TextIO.read().from("Sample.txt"));
        // Write to the output file with wordcounts as a prefix
        textData.apply(TextIO.write().to("wordcounts"));
        // Run the pipeline
        p.run().waitUntilFinish();
    }
}

The steps are pretty simple:

Create a PipelineOption.
Create a Pipeline with the option.
Add the logic to read data from Sample.txt to the pipeline and get the return value as PCollection, which is an abstraction of dataset in Apache Beam.
Add another step to write the return value in the previous step to output file with name starting with wordcounts.
Lastly, run and finish the pipeline.

Here we don’t dive into the details of DSL. You can check the details in programming guide.

You also need to import the dependencies to use PipelineOptions, Pipeline, etc.

If you’re using VSCode, you can reload all the dependencies with Gradle extension in VSCode by clicking on the icon:

4. Prepare Sample.txt

Prepare Sample.txt in app directory. You can write any text in the file like the following:

This post will cover how to start creating your first Apache Beam application in Java.

5. Run the first Apache Beam app

You can run the app by the command:

./gradlew run
Starting a Gradle Daemon, 4 incompatible Daemons could not be reused, use --status for details

> Task :app:run
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

BUILD SUCCESSFUL in 7s
2 actionable tasks: 2 executed

or you can also run by Gradle extension of VSCode:

Now you’ll find a new file app/wordcounts-00000-of-00001 , which contains exactly the same contents as app/Sample.txt.

cat app/wordcounts-00000-of-00001
This post will cover how to start creating your first Apache Beam application in Java.

Summary

In this post, we set up a simple Apache Beam application in Java with Gradle and confirmed the app successfully read Sample.txt and write the contents to an output file with the specified name prefix wordcounts-.

This is the beginning of the Apache Beam journey. Let’s write some more logic in the next step.

Set Up Your First Apache Beam Application (Java)

Overview

Tools

Steps

1. Create a Gradle project

2. Update dependencies (app/build.gradle.kts)

3. Write the first Apache Beam logic

4. Prepare Sample.txt

5. Run the first Apache Beam app

Summary

References

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Masato Naka

No responses yet