Skip to content

Commit 10c37ac

Browse files
happyboy1024happyboy1024
andauthored
[Feature][Doris] Support multi-table source read (#7895)
Co-authored-by: happyboy1024 <[email protected]>
1 parent 25ae492 commit 10c37ac

File tree

35 files changed

+1947
-591
lines changed

35 files changed

+1947
-591
lines changed

docs/en/connector-v2/source/Doris.md

Lines changed: 65 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,27 +13,21 @@
1313
- [x] [batch](../../concept/connector-v2-features.md)
1414
- [ ] [stream](../../concept/connector-v2-features.md)
1515
- [ ] [exactly-once](../../concept/connector-v2-features.md)
16-
- [x] [schema projection](../../concept/connector-v2-features.md)
16+
- [x] [column projection](../../concept/connector-v2-features.md)
1717
- [x] [parallelism](../../concept/connector-v2-features.md)
1818
- [x] [support user-defined split](../../concept/connector-v2-features.md)
19+
- [x] [support multiple table read](../../concept/connector-v2-features.md)
1920

2021
## Description
2122

22-
Used to read data from Doris.
23-
Doris Source will send a SQL to FE, FE will parse it into an execution plan, send it to BE, and BE will
24-
directly return the data
23+
Used to read data from Apache Doris.
2524

2625
## Supported DataSource Info
2726

2827
| Datasource | Supported versions | Driver | Url | Maven |
2928
|------------|--------------------------------------|--------|-----|-------|
3029
| Doris | Only Doris2.0 or later is supported. | - | - | - |
3130

32-
## Database Dependency
33-
34-
> Please download the support list corresponding to 'Maven' and copy it to the '$SEATNUNNEL_HOME/plugins/jdbc/lib/'
35-
> working directory<br/>
36-
3731
## Data Type Mapping
3832

3933
| Doris Data type | SeaTunnel Data type |
@@ -54,29 +48,40 @@ directly return the data
5448

5549
## Source Options
5650

51+
Base configuration:
52+
5753
| Name | Type | Required | Default | Description |
5854
|----------------------------------|--------|----------|------------|-----------------------------------------------------------------------------------------------------|
5955
| fenodes | string | yes | - | FE address, the format is `"fe_host:fe_http_port"` |
6056
| username | string | yes | - | User username |
6157
| password | string | yes | - | User password |
58+
| doris.request.retries | int | no | 3 | Number of retries to send requests to Doris FE. |
59+
| doris.request.read.timeout.ms | int | no | 30000 | |
60+
| doris.request.connect.timeout.ms | int | no | 30000 | |
61+
| query-port | string | no | 9030 | Doris QueryPort |
62+
| doris.request.query.timeout.s | int | no | 3600 | Timeout period of Doris scan data, expressed in seconds. |
63+
| table_list | string || - | table list |
64+
65+
Table list configuration:
66+
67+
| Name | Type | Required | Default | Description |
68+
|----------------------------------|--------|----------|------------|-----------------------------------------------------------------------------------------------------|
6269
| database | string | yes | - | The name of Doris database |
6370
| table | string | yes | - | The name of Doris table |
6471
| doris.read.field | string | no | - | Use the 'doris.read.field' parameter to select the doris table columns to read |
65-
| query-port | string | no | 9030 | Doris QueryPort |
6672
| doris.filter.query | string | no | - | Data filtering in doris. the format is "field = value",example : doris.filter.query = "F_ID > 2" |
6773
| doris.batch.size | int | no | 1024 | The maximum value that can be obtained by reading Doris BE once. |
68-
| doris.request.query.timeout.s | int | no | 3600 | Timeout period of Doris scan data, expressed in seconds. |
6974
| doris.exec.mem.limit | long | no | 2147483648 | Maximum memory that can be used by a single be scan request. The default memory is 2G (2147483648). |
70-
| doris.request.retries | int | no | 3 | Number of retries to send requests to Doris FE. |
71-
| doris.request.read.timeout.ms | int | no | 30000 | |
72-
| doris.request.connect.timeout.ms | int | no | 30000 | |
75+
76+
Note: When this configuration corresponds to a single table, you can flatten the configuration items in table_list to the outer layer.
7377

7478
### Tips
7579

7680
> It is not recommended to modify advanced parameters at will
7781
78-
## Task Example
82+
## Example
7983

84+
### single table
8085
> This is an example of reading a Doris table and writing to Console.
8186
8287
```
@@ -159,4 +164,49 @@ sink {
159164
Console {}
160165
}
161166
```
167+
### Multiple table
168+
```
169+
env{
170+
parallelism = 1
171+
job.mode = "BATCH"
172+
}
162173
174+
source{
175+
Doris {
176+
fenodes = "xxxx:8030"
177+
username = root
178+
password = ""
179+
table_list = [
180+
{
181+
database = "st_source_0"
182+
table = "doris_table_0"
183+
doris.read.field = "F_ID,F_INT,F_BIGINT,F_TINYINT"
184+
doris.filter.query = "F_ID >= 50"
185+
},
186+
{
187+
database = "st_source_1"
188+
table = "doris_table_1"
189+
}
190+
]
191+
}
192+
}
193+
194+
transform {}
195+
196+
sink{
197+
Doris {
198+
fenodes = "xxxx:8030"
199+
schema_save_mode = "RECREATE_SCHEMA"
200+
username = root
201+
password = ""
202+
database = "st_sink"
203+
table = "${table_name}"
204+
sink.enable-2pc = "true"
205+
sink.label-prefix = "test_json"
206+
doris.config = {
207+
format="json"
208+
read_json_by_line="true"
209+
}
210+
}
211+
}
212+
```
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Doris
2+
3+
> Doris 源连接器
4+
5+
## 支持的引擎
6+
7+
> Spark<br/>
8+
> Flink<br/>
9+
> SeaTunnel Zeta<br/>
10+
11+
## 主要功能
12+
13+
- [x] [批处理](../../concept/connector-v2-features.md)
14+
- [ ] [流处理](../../concept/connector-v2-features.md)
15+
- [ ] [精确一次](../../concept/connector-v2-features.md)
16+
- [x] [列投影](../../concept/connector-v2-features.md)
17+
- [x] [并行度](../../concept/connector-v2-features.md)
18+
- [x] [支持用户自定义分片](../../concept/connector-v2-features.md)
19+
- [x] [支持多表读](../../concept/connector-v2-features.md)
20+
21+
## 描述
22+
23+
用于 Apache Doris 的源连接器。
24+
25+
## 支持的数据源信息
26+
27+
| 数据源 | 支持版本 | 驱动 | Url | Maven |
28+
|------------|--------------------------------------|--------|-----|-------|
29+
| Doris | 仅支持Doris2.0及以上版本. | - | - | - |
30+
31+
## 数据类型映射
32+
33+
| Doris 数据类型 | SeaTunnel 数据类型 |
34+
|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
35+
| INT | INT |
36+
| TINYINT | TINYINT |
37+
| SMALLINT | SMALLINT |
38+
| BIGINT | BIGINT |
39+
| LARGEINT | STRING |
40+
| BOOLEAN | BOOLEAN |
41+
| DECIMAL | DECIMAL((Get the designated column's specified column size)+1,<br/>(Gets the designated column's number of digits to right of the decimal point.))) |
42+
| FLOAT | FLOAT |
43+
| DOUBLE | DOUBLE |
44+
| CHAR<br/>VARCHAR<br/>STRING<br/>TEXT | STRING |
45+
| DATE | DATE |
46+
| DATETIME<br/>DATETIME(p) | TIMESTAMP |
47+
| ARRAY | ARRAY |
48+
49+
## 源选项
50+
51+
基础配置:
52+
53+
| 名称 | 类型 | 是否必须 | 默认值 | 描述 |
54+
|----------------------------------|--------|----------|------------|-----------------------------------------------------------------------------------------------------|
55+
| fenodes | string | yes | - | FE 地址, 格式:`"fe_host:fe_http_port"` |
56+
| username | string | yes | - | 用户名 |
57+
| password | string | yes | - | 密码 |
58+
| doris.request.retries | int | no | 3 | 请求Doris FE的重试次数 |
59+
| doris.request.read.timeout.ms | int | no | 30000 | |
60+
| doris.request.connect.timeout.ms | int | no | 30000 | |
61+
| query-port | string | no | 9030 | Doris查询端口 |
62+
| doris.request.query.timeout.s | int | no | 3600 | Doris扫描数据的超时时间,单位秒 |
63+
| table_list | string || - | 表清单 |
64+
65+
表清单配置:
66+
67+
| 名称 | 类型 | 是否必须 | 默认值 | 描述 |
68+
|----------------------------------|--------|----------|------------|-----------------------------------------------------------------------------------------------------|
69+
| database | string | yes | - | 数据库 |
70+
| table | string | yes | - | 表名 |
71+
| doris.read.field | string | no | - | 选择要读取的Doris表字段 |
72+
| doris.filter.query | string | no | - | 数据过滤. 格式:"字段 = 值", 例如:doris.filter.query = "F_ID > 2" |
73+
| doris.batch.size | int | no | 1024 | 每次能够从BE中读取到的最大行数 |
74+
| doris.exec.mem.limit | long | no | 2147483648 | 单个be扫描请求可以使用的最大内存。默认内存为2G(2147483648) |
75+
76+
注意: 当此配置对应于单个表时,您可以将table_list中的配置项展平到外层。
77+
78+
### 提示
79+
80+
> 不建议随意修改高级参数
81+
82+
## 例子
83+
84+
### 单表
85+
> 这是一个从doris读取数据后,输出到控制台的例子:
86+
87+
```
88+
env {
89+
parallelism = 2
90+
job.mode = "BATCH"
91+
}
92+
source{
93+
Doris {
94+
fenodes = "doris_e2e:8030"
95+
username = root
96+
password = ""
97+
database = "e2e_source"
98+
table = "doris_e2e_table"
99+
}
100+
}
101+
102+
transform {
103+
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
104+
# please go to https://seatunnel.apache.org/docs/transform/sql
105+
}
106+
107+
sink {
108+
Console {}
109+
}
110+
```
111+
112+
使用`doris.read.field`参数来选择需要读取的Doris表字段:
113+
114+
```
115+
env {
116+
parallelism = 2
117+
job.mode = "BATCH"
118+
}
119+
source{
120+
Doris {
121+
fenodes = "doris_e2e:8030"
122+
username = root
123+
password = ""
124+
database = "e2e_source"
125+
table = "doris_e2e_table"
126+
doris.read.field = "F_ID,F_INT,F_BIGINT,F_TINYINT,F_SMALLINT"
127+
}
128+
}
129+
130+
transform {
131+
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
132+
# please go to https://seatunnel.apache.org/docs/transform/sql
133+
}
134+
135+
sink {
136+
Console {}
137+
}
138+
```
139+
140+
使用`doris.filter.query`来过滤数据,参数值将作为过滤条件直接传递到doris:
141+
142+
```
143+
env {
144+
parallelism = 2
145+
job.mode = "BATCH"
146+
}
147+
source{
148+
Doris {
149+
fenodes = "doris_e2e:8030"
150+
username = root
151+
password = ""
152+
database = "e2e_source"
153+
table = "doris_e2e_table"
154+
doris.filter.query = "F_ID > 2"
155+
}
156+
}
157+
158+
transform {
159+
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
160+
# please go to https://seatunnel.apache.org/docs/transform/sql
161+
}
162+
163+
sink {
164+
Console {}
165+
}
166+
```
167+
### 多表
168+
```
169+
env{
170+
parallelism = 1
171+
job.mode = "BATCH"
172+
}
173+
174+
source{
175+
Doris {
176+
fenodes = "xxxx:8030"
177+
username = root
178+
password = ""
179+
table_list = [
180+
{
181+
database = "st_source_0"
182+
table = "doris_table_0"
183+
doris.read.field = "F_ID,F_INT,F_BIGINT,F_TINYINT"
184+
doris.filter.query = "F_ID >= 50"
185+
},
186+
{
187+
database = "st_source_1"
188+
table = "doris_table_1"
189+
}
190+
]
191+
}
192+
}
193+
194+
transform {}
195+
196+
sink{
197+
Doris {
198+
fenodes = "xxxx:8030"
199+
schema_save_mode = "RECREATE_SCHEMA"
200+
username = root
201+
password = ""
202+
database = "st_sink"
203+
table = "${table_name}"
204+
sink.enable-2pc = "true"
205+
sink.label-prefix = "test_json"
206+
doris.config = {
207+
format="json"
208+
read_json_by_line="true"
209+
}
210+
}
211+
}
212+
```

seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/backend/BackendClient.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
import org.apache.seatunnel.shade.org.apache.thrift.transport.TTransport;
2626
import org.apache.seatunnel.shade.org.apache.thrift.transport.TTransportException;
2727

28-
import org.apache.seatunnel.connectors.doris.config.DorisConfig;
28+
import org.apache.seatunnel.connectors.doris.config.DorisSourceConfig;
2929
import org.apache.seatunnel.connectors.doris.exception.DorisConnectorErrorCode;
3030
import org.apache.seatunnel.connectors.doris.exception.DorisConnectorException;
3131
import org.apache.seatunnel.connectors.doris.source.serialization.Routing;
@@ -55,7 +55,7 @@ public class BackendClient {
5555
private final int socketTimeout;
5656
private final int connectTimeout;
5757

58-
public BackendClient(Routing routing, DorisConfig readOptions) {
58+
public BackendClient(Routing routing, DorisSourceConfig readOptions) {
5959
this.routing = routing;
6060
this.connectTimeout = readOptions.getRequestConnectTimeoutMs();
6161
this.socketTimeout = readOptions.getRequestReadTimeoutMs();

0 commit comments

Comments
 (0)