Skip to content

kestra-io/plugin-beam

Repository files navigation

Kestra workflow orchestrator

Event-Driven Declarative Orchestrator

Last Version License Github star
Kestra infinitely scalable orchestration and scheduling platform Slack

twitter linkedin youtube


Get started in 3 minutes with Kestra

Get started with Kestra in 3 minutes.

Kestra Apache Beam Plugin

A plugin to execute Apache Beam pipelines via Kestra.

This plugin allows you to execute Apache Beam YAML pipelines directly from Kestra. It supports both Java and Python SDKs and can dispatch jobs to various runners including Direct, Flink, Spark, and Google Cloud Dataflow.

Key features:

  • Multi-Runner Support: Switch between local execution (Direct) and distributed processing (Flink, Spark, Dataflow) by changing a simple property.
  • Dual SDK: Run pipelines using the Java SDK (embedded) or the Python SDK (via Docker/TaskRunner).
  • Metrics Integration: Automatically captures Beam Counters, Distributions, and Gauges and exposes them as Kestra metrics.
  • Declarative Pipelines: Define your Beam pipeline using the Beam YAML syntax inline or from a file.

Kestra orchestrator

Development

Prerequisites

  • Java 21
  • Docker
  • Gradle

Local Development Infrastructure

To test this plugin with distributed runners (Flink and Spark) locally, a helper script is provided to spin up the necessary infrastructure.

Starting the environment: Run the following script to start Flink (JobManager/TaskManager) and Spark (Master/Worker) using Docker Compose. The script includes health checks to ensure services are fully ready before returning.

./local-setup-unit.sh

Accessing Services: Once the script completes, you can access the dashboards:

Running tests

./gradlew check --parallel

Development

VSCode:

Follow the README.md within the .devcontainer folder for a quick and easy way to get up and running with developing plugins if you are using VSCode.

Other IDEs:

./gradlew shadowJar && docker build -t kestra-custom . && docker run --rm -p 8080:8080 kestra-custom server local

Note

You need to relaunch this whole command everytime you make a change to your plugin

go to http://localhost:8080, your plugin will be available to use

Documentation

Apache Beam Tasks

RunPipeline executes Beam YAML pipelines inline or from a file, forwards Beam counters, distributions, and gauges into Kestra metrics, and can target Direct, Flink, Spark, or Dataflow runners with the Java or Python SDK.

id: beam-direct
namespace: dev.beam

tasks:
  - id: run-beam
    type: io.kestra.plugin.beam.RunPipeline
    sdk: JAVA
    beamRunner: DIRECT
    definition: |
      pipeline:
        transforms:
          - type: ReadFromText
            config:
              path: "data.txt"
          - type: Count

License

Apache 2.0 © Kestra Technologies

Stay up to date

We release new versions every month. Give the main repository a star to stay up to date with the latest releases and get notified about future updates.

Star the repo

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5