Skip to content

[Feature][seatunnel-server] Architectural Design #1968

Open
@dijiekstra

Description

@dijiekstra

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

Architectural Design

Role division

We infer the modules and functions of a complete Seatunnel-Server from the functions of Web pages from top to bottom

Web

The main function of Web is to provide visual task editing and development as well as task operation and maintenance, so the main 'menu' of Web is

  • development
  • maintenance

On this basis, in order to facilitate the development and operation and maintenance, task editing

  1. If the data source can be filled and used multiple times, the task editing process can be greatly simplified

  2. When our Scheduler(described below) uses an embedded, the stability of the Server(described below) is of great concern to us

  3. In addition, if there is no authority control for development and maintenance, everything will become very scary

It follows from the above that we also need

  • Governance (Provides data source and permission control)

  • monitor (Monitors the health of the service itself and external dependent services)

Of course, this will not be the final design. I believe that with the development of Seatunnel, users will have higher requirements for Seatunnel Web, and more and more requirements will emerge. At that time, we will expand the functions according to the requirements

Server

As a bridge between the Web and Scheduler, Server is responsible for the translation and forwarding of Web requests.

There are three common patterns for users to follow when developing tasks using the Web

  1. navigation mode

  2. script mode

  3. canvas mode

Canvas mode aside, this is a very, very late feature. Let's start with navigation mode and script mode

In essence, navigation mode provides a more convenient and simple way to develop a Seatunnel task. Therefore, a task is abstracted into several modules such as' source ', 'destination' and 'mapping'. After assembling these modules, a unique JSON or DSL script is formed. Scheduler will eventually perform the task.

Why does DSL exist? Why can't the Seatunnel script be assembled on the Web side?

  • The reason for using a DSL is to provide a common JSON template so that the front end does not have to do repetitive or p2p development for each datasource when developing Web pages

  • If seatunnel's own execution scripts are assembled by the front end, the requirements on the front end developers are high, and the front end does not need to care about the final execution scripts, but just constructs the parameters of the page into JSON and throws it to the Server for processing.

Having said translation, let's talk about forwarding.

What kind of requests are forwarded?

Requests for task runtime information, status, and results are forwarded to Scheduler. The Server does not maintain or save this information, because the Server stores the information before the task is submitted. To be more technical, tasks are called 'jobs' when they are developed, and each execution instance on the Scheduler side is called' task-instance 'after it is published to the Scheduler, similar to the relationship between a mold and the product from the mold

The remaining functions are mainly simple CRUDS, such as data source management, monitoring information collection and reporting, and so on

Scheduler

There are three main parts in Scheduler

  1. Unified scheduling layer abstraction

Similar to scheduler-Proxy, the interface defines all scheduling and execution, as well as all their related functions

  1. Built-in Scheduler engine

Provides simple task execution and scheduling capabilities, some capabilities are not supported such as dependency triggers, workflow models, etc.

  1. Third-part Scheduler engine

The API package and SDK integration of the third-party Scheduler engine and the interface of the abstraction layer are reloaded to complete the integration with other Scheduler engines

The detailed interface definition and implementation Design will be presented in [Detail Design]

Interaction process

Flow Chart

Task save
image
Temporary execution
image
Task maintenance
image

Simple instructions

  • Task save
  1. After completing the task configuration on the Web UI, the web sends the JSON file to the Server
  2. When receiving the save operation, the Server saves the JSON and translates the script required by the Seatunnel based on the JSON and configuration information, and synchronizes the script to the Scheduler
  3. Scheduler sends msg to the real engine based on the configuration information
  • Temporary execution
  1. Users temporarily execute tasks on the Web and send the execution information to the Server
  2. Upon receiving the request, the Server parses the JSON task and forwards it to the Scheduler
  3. Upon receiving a temporary execution request, the Scheduler submits the final execution script to the corresponding Scheduler-engine.
  4. After this series of successful operations, the Web shows that the execution is successful and starts asking the Server repeatedly for logs and execution results
  5. The Server forwards the Query to the Scheduler, which forwards it to the Scheduler engine and returns the result
  6. Show results at the front end after multiple rounds of training
  • Task maintenance
  1. Open the maintenance center and choose to display the execution flow
  2. The Web requests the Server with a series of information such as filtering conditions
  3. The Server directly forwards the msg
  4. The Scheduler requests the engine and returns the result
  5. The Server returns to the Web page
  6. Display the execution flow on the Web

Other items

  • This design only focuses on development and operation and maintenance. The 'Schema evolution', 'data time' and other functions described in the overview are not designed and implemented

概要设计

角色划分

我们从Web页面的功能,自上而下的来推断,完整的seatunnel-server会有哪些模块及功能

Web

Web最主要的功能是提供可视化的任务编辑与开发以及任务运维,所以Web最主要的menu

  • development
  • maintenance

以此为基础,为了更便利的开发与运维,在进行任务编辑时

  1. 如果数据源可以一次填充多次使用,那么可以极大的简化了任务编辑的流程
  2. 当我们的Scheduler(下文会有讲)使用的是内嵌的模式,那么Server(下文会有讲)的稳定性是我们很关注的事情
  3. 另外,如果开发、运维等操作没有权限管控,那么一切都会变得非常可怕
    由上面可以推断出,我们还需要
  • governance (提供数据源以及权限的管控)
  • monitor (监控服务自身健康以及外部依赖服务健康)

当然,这不会是最终的设计,我相信随着seatunnel的发展,用户对seatunnel web的要求会越来越高,越来越多的需求将会涌现,到时候我们会再根据需求进行功能的拓展,就目前来说,这4个menu足以满足

Server

Server作为Web和Scheduler中间的桥梁,主要是负责Web请求的翻译与转发。
用户在使用Web进行任务开发时,通常有三种模式

  1. 向导模式
  2. 脚本模式
  3. 画布

画布模式暂且不提,这属于很后面很后面的功能。先说向导模式和脚本模式
向导模式本质上是提供更加便利更加简单的方式去开发一个seatunnel的任务,所以会将一个任务抽象为来源去向映射等几个模块儿,将这几个模块组装后,会形成一个独有的JSON或者是DSL脚本,这个DSL脚本在被解析后,最终会交由Scheduler去执行。
为什么会有DSL的存在?为什么不能在Web侧就将seatunnel的脚本组装完毕?

  • 之所以用DSL的原因是在于提供通用的JSON模板,这样前端在开发Web页面时,不需要针对每个数据源进行重复性开发又或者叫烟囱式开发
  • 如果让前端组装seatunnel自身的执行脚本,那么对前端开发人员的要求比较高,而且前端不需要关注最终执行脚本是什么样子,只需要将页面上的参数构造成JSON丢给Server去处理即可

说完了翻译,接下来再说一下转发

转发什么样的请求?
一些需要获取任务运行时信息、状态、结果的请求,会被转发到Scheduler。这些信息Server本身不进行维护和保存,因为Server存储的都是任务提交前的信息。如果说的更专业一点,任务在开发时叫做job,在发布到Scheduler后,在Scheduler侧的每个执行实例叫做task-instance,类似于模具和根据模具生产出来的产品的关系

剩余的功能主要是一些简单的CRUD,比如数据源的管理、监控信息的收集与上报等等

Scheduler

Scheduler中主要有这三部分内容

  1. 统一调度层的抽象
    类似于scheduler-proxy,定义所有调度与执行,以及它们所有相关功能的接口
  2. 内嵌的Scheduler引擎
    提供简单的任务执行与调度的能力,部分能力不支持如:依赖触发、工作流模型等等
  3. 三方Scheduler引擎
    将第三方Scheduler引擎的API封装、SDK集成,重载抽象层的接口,完成与其他Scheduler引擎的集成

具体的接口定义和实现的设计,将会在[Detail Design] 中体现

交互流程

流程图

任务保存
image
任务临时执行
image
任务运维
image

简单说明

  • 任务保存
  1. 用户在Web操作完成任务配置后,将JSON发送到Server端
  2. Server在收到保存操作时,首先会将JSON保存,再根据JSON及其配置信息,翻译成seatunnel执行所需要的脚本,并同步到Scheduler中
  3. Scheduler会根据配置的信息,交由真正的引擎去保存
  • 任务临时执行
  1. 用户在Web临时执行任务,将执行信息发送到Server端
  2. Server端在收到请求后,将任务JSON解析并转发给Scheduler
  3. Scheduler收到临时执行请求后,将最终执行脚本提交给对应的Scheduler引擎。
  4. 这一系列操作成功后Web会显示执行成功,并开始不断请求Server以获取日志和执行结果
  5. Server将Query转发至Scheduler,Scheduler转发至Scheduler引擎,并将结果返回
  6. 在多次轮训后,在前端展示结果
  • 任务运维
  1. 用户打开运维中心,选择展示执行流水
  2. 此时Web带着过滤条件等一系列信息请求Server
  3. Server收到后直接转发
  4. Scheduler收到后请求引擎并返回结果
  5. Server收到结果返回Web
  6. Web展示执行流水

其它事项

  • 本次设计只设计开发和运维两方面,在概述中所描述的schema evolution数据时间等功能,暂不设计与实现

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions