Description
Search before asking
- I had searched in the DSIP and found no similar DSIP.
Motivation
In Flink 1.20, Flink introduced Materialized Tables , aimed at simplifying both batch and stream data pipelines while providing a consistent development experience.
In Full Mode, Flink’s Materialized Table relies on a scheduler to periodically trigger Flink’s refresh jobs. To support integration with different schedulers, the Flink community has designed a standardized scheduler interface.
To enable Flink SQL Materialized Table refresh tasks in DolphinScheduler, we propose introducing a new task type specifically for executing refresh jobs.
The following is a sequence diagram illustrating the interaction between the Flink framework and the scheduler.
Design Detail
2.1 Overview
To integrate between DolphinScheduler and Flink Materialized Tables, we need to introduce a new task type for executing Flink refresh operations. The core logic of this task includes:
- Creating a SQL session
- Submitting refresh tasks
- Waiting for the task to finish
2.2 Implementation Details
2.2.1 Task Parameters Configuration
Required information for task creation:
Parameter | Description | Required | Default Value |
---|---|---|---|
identifier | Table identifier to be refreshed | Yes | |
gatewayEndpoint | The Flink SQL gateway for executing the refresh job. HTTP endpoint of the Gateway, format http(s)://host:port | Yes | |
dynamicOptions | Dynamic options for refresh the table | No | |
executionConfig | A set of configurations for executing the refresh task. | No | |
initConfig | A set of configurations for initializing the session | No |
2.2.2 API Interaction Flow
We will use Flink SQL Gateway restful api to execute the refresh task.
-
Create Session:
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/rest/#sessions -
Submit Refresh Job
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/rest/#sessions-session_handle-materialized-tables-identifier-refresh -
Monitor Refresh Job Status
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/rest/#sessions-session_handle-statements-1
2.2.3 User Interface Design
The new task type will provide the following UI elements in the DolphinScheduler interface:
- identifier : The table identifier to be refreshed.
- gatewayEndpoint: The Flink SQL gateway for executing the refresh job.
- dynamicOptions: Dynamic options for constructing the refresh job.
- initConfig: The initialization config for creating the session.
- executionConfig: The execution configuration when executing the refresh job.
- schedulerTime: The scheduler time for executing the refresh job.
Compatibility, Deprecation, and Migration Plan
As this is a new task type, there are no compatibility issues to address.
Test Plan
- We'll conduct unit tests to ensure that the functionality operates as anticipated. To test the integration workflow, we'll simulate an HTTP server. This approach helps us steer clear of introducing Flink - related dependencies.
- An end - to - end (e2e) test will be added to validate the entire workflow between Flink and DolphinScheduler. We'll develop a test Flink plugin to facilitate integration with the Flink SQL gateway and execute the e2e tests.
Code of Conduct
- I agree to follow this project's Code of Conduct