Skip to content

[DSIP-90][plugin] Introduce new task plugin for Flink materialized table #17086

Open
@hackergin

Description

@hackergin

Search before asking

  • I had searched in the DSIP and found no similar DSIP.

Motivation

In Flink 1.20, Flink introduced Materialized Tables , aimed at simplifying both batch and stream data pipelines while providing a consistent development experience.

In Full Mode, Flink’s Materialized Table relies on a scheduler to periodically trigger Flink’s refresh jobs. To support integration with different schedulers, the Flink community has designed a standardized scheduler interface.

To enable Flink SQL Materialized Table refresh tasks in DolphinScheduler, we propose introducing a new task type specifically for executing refresh jobs.

The following is a sequence diagram illustrating the interaction between the Flink framework and the scheduler.

Image###

Design Detail

2.1 Overview

To integrate between DolphinScheduler and Flink Materialized Tables, we need to introduce a new task type for executing Flink refresh operations. The core logic of this task includes:

  • Creating a SQL session
  • Submitting refresh tasks
  • Waiting for the task to finish

Image

2.2 Implementation Details

2.2.1 Task Parameters Configuration

Required information for task creation:

Parameter Description Required Default Value
identifier Table identifier to be refreshed Yes  
gatewayEndpoint The Flink SQL gateway for executing the refresh job. HTTP endpoint of the Gateway, format http(s)://host:port Yes  
dynamicOptions Dynamic options for refresh the table No  
executionConfig A set of configurations for executing the refresh task. No  
initConfig A set of configurations for initializing the session No  

2.2.2 API Interaction Flow

We will use Flink SQL Gateway restful api to execute the refresh task.

  1. Create Session:
    https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/rest/#sessions

  2. Submit Refresh Job
    https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/rest/#sessions-session_handle-materialized-tables-identifier-refresh

  3. Monitor Refresh Job Status
    https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/rest/#sessions-session_handle-statements-1

2.2.3 User Interface Design

The new task type will provide the following UI elements in the DolphinScheduler interface:

  • identifier : The table identifier to be refreshed.
  • gatewayEndpoint: The Flink SQL gateway for executing the refresh job.
  • dynamicOptions: Dynamic options for constructing the refresh job.
  • initConfig: The initialization config for creating the session.
  • executionConfig: The execution configuration when executing the refresh job.
  • schedulerTime: The scheduler time for executing the refresh job.

Compatibility, Deprecation, and Migration Plan

As this is a new task type, there are no compatibility issues to address.

Test Plan

  • We'll conduct unit tests to ensure that the functionality operates as anticipated. To test the integration workflow, we'll simulate an HTTP server. This approach helps us steer clear of introducing Flink - related dependencies.
  • An end - to - end (e2e) test will be added to validate the entire workflow between Flink and DolphinScheduler. We'll develop a test Flink plugin to facilitate integration with the Flink SQL gateway and execute the e2e tests.

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions