Skip to content

Commit de44573

Browse files
authored
[Doc][Improve] translate Redis/Paimon related chinese document (#8584)
1 parent c4a7afc commit de44573

File tree

2 files changed

+519
-0
lines changed

2 files changed

+519
-0
lines changed

Diff for: docs/zh/connector-v2/source/Paimon.md

+224
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# Paimon
2+
3+
> Paimon 源连接器
4+
5+
## 描述
6+
7+
用于从 `Apache Paimon` 读取数据
8+
9+
## 主要功能
10+
11+
- [x] [批处理](../../concept/connector-v2-features.md)
12+
- [x] [流处理](../../concept/connector-v2-features.md)
13+
- [ ] [精确一次](../../concept/connector-v2-features.md)
14+
- [ ] [列投影](../../concept/connector-v2-features.md)
15+
- [ ] [并行度](../../concept/connector-v2-features.md)
16+
- [ ] [支持用户自定义分片](../../concept/connector-v2-features.md)
17+
18+
## 配置选项
19+
20+
| 名称 | 类型 | 是否必须 | 默认值 |
21+
|-------------------------|--------|----------|---------------|
22+
| warehouse | String || - |
23+
| catalog_type | String || filesystem |
24+
| catalog_uri | String || - |
25+
| database | String || - |
26+
| table | String || - |
27+
| hdfs_site_path | String || - |
28+
| query | String || - |
29+
| paimon.hadoop.conf | Map || - |
30+
| paimon.hadoop.conf-path | String || - |
31+
32+
### warehouse [string]
33+
34+
Paimon warehouse 路径
35+
36+
### catalog_type [string]
37+
38+
Paimon Catalog 类型,支持 filesystem 和 hive
39+
40+
### catalog_uri [string]
41+
42+
Paimon 的 catalog uri,仅当 catalog_type 为 hive 时需要
43+
44+
### database [string]
45+
46+
需要访问的数据库
47+
48+
### table [string]
49+
50+
需要访问的表
51+
52+
### hdfs_site_path [string]
53+
54+
`hdfs-site.xml` 文件地址
55+
56+
### query [string]
57+
58+
读取表格的筛选条件,例如:`select * from st_test where id > 100`。如果未指定,则将读取所有记录。
59+
60+
目前,`where` 支持`<, <=, >, >=, =, !=, or, and,is null, is not null`,其他暂不支持。
61+
62+
由于 Paimon 限制,目前不支持 `Having`, `Group By``Order By`,未来版本将会支持 `projection``limit`
63+
64+
注意:当 `where` 后的字段为字符串或布尔值时,其值必须使用单引号,否则将会报错。例如 `name='abc'``tag='true'`
65+
66+
当前 `where` 支持的字段数据类型如下:
67+
68+
* string
69+
* boolean
70+
* tinyint
71+
* smallint
72+
* int
73+
* bigint
74+
* float
75+
* double
76+
* date
77+
* timestamp
78+
79+
### paimon.hadoop.conf [string]
80+
81+
hadoop conf 属性
82+
83+
### paimon.hadoop.conf-path [string]
84+
85+
指定 'core-site.xml', 'hdfs-site.xml', 'hive-site.xml' 文件加载路径。
86+
87+
## Filesystems
88+
89+
Paimon 连接器支持向多个文件系统写入数据。目前,支持的文件系统有 `hdfs``s3`
90+
如果使用 `s3` 文件系统,可以在 `paimon.hadoop.conf` 中配置`fs.s3a.access-key``fs.s3a.secret-key``fs.s3a.endpoint``fs.s3a.path.style.access``fs.s3a.aws.credentials.provider` 属性,数仓地址应该以 `s3a://` 开头。
91+
92+
## 示例
93+
94+
### 简单示例
95+
96+
```hocon
97+
source {
98+
Paimon {
99+
warehouse = "/tmp/paimon"
100+
database = "default"
101+
table = "st_test"
102+
}
103+
}
104+
```
105+
106+
### Filter 示例
107+
108+
```hocon
109+
source {
110+
Paimon {
111+
warehouse = "/tmp/paimon"
112+
database = "full_type"
113+
table = "st_test"
114+
query = "select c_boolean, c_tinyint from st_test where c_boolean= 'true' and c_tinyint > 116 and c_smallint = 15987 or c_decimal='2924137191386439303744.39292213'"
115+
}
116+
}
117+
```
118+
119+
### S3 示例
120+
```hocon
121+
env {
122+
execution.parallelism = 1
123+
job.mode = "BATCH"
124+
}
125+
126+
source {
127+
Paimon {
128+
warehouse = "s3a://test/"
129+
database = "seatunnel_namespace11"
130+
table = "st_test"
131+
paimon.hadoop.conf = {
132+
fs.s3a.access-key=G52pnxg67819khOZ9ezX
133+
fs.s3a.secret-key=SHJuAQqHsLrgZWikvMa3lJf5T0NfM5LMFliJh9HF
134+
fs.s3a.endpoint="http://minio4:9000"
135+
fs.s3a.path.style.access=true
136+
fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
137+
}
138+
}
139+
}
140+
141+
sink {
142+
Console{}
143+
}
144+
```
145+
146+
### Hadoop 配置示例
147+
148+
```hocon
149+
source {
150+
Paimon {
151+
catalog_name="seatunnel_test"
152+
warehouse="hdfs:///tmp/paimon"
153+
database="seatunnel_namespace1"
154+
table="st_test"
155+
query = "select * from st_test where pk_id is not null and pk_id < 3"
156+
paimon.hadoop.conf = {
157+
fs.defaultFS = "hdfs://nameservice1"
158+
dfs.nameservices = "nameservice1"
159+
dfs.ha.namenodes.nameservice1 = "nn1,nn2"
160+
dfs.namenode.rpc-address.nameservice1.nn1 = "hadoop03:8020"
161+
dfs.namenode.rpc-address.nameservice1.nn2 = "hadoop04:8020"
162+
dfs.client.failover.proxy.provider.nameservice1 = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
163+
dfs.client.use.datanode.hostname = "true"
164+
}
165+
}
166+
}
167+
```
168+
169+
### Hive catalog 示例
170+
171+
```hocon
172+
source {
173+
Paimon {
174+
catalog_name="seatunnel_test"
175+
catalog_type="hive"
176+
catalog_uri="thrift://hadoop04:9083"
177+
warehouse="hdfs:///tmp/seatunnel"
178+
database="seatunnel_test"
179+
table="st_test3"
180+
paimon.hadoop.conf = {
181+
fs.defaultFS = "hdfs://nameservice1"
182+
dfs.nameservices = "nameservice1"
183+
dfs.ha.namenodes.nameservice1 = "nn1,nn2"
184+
dfs.namenode.rpc-address.nameservice1.nn1 = "hadoop03:8020"
185+
dfs.namenode.rpc-address.nameservice1.nn2 = "hadoop04:8020"
186+
dfs.client.failover.proxy.provider.nameservice1 = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
187+
dfs.client.use.datanode.hostname = "true"
188+
}
189+
}
190+
}
191+
```
192+
193+
## Changelog
194+
195+
如果要读取 paimon 表的 changelog,首先要为 Paimon 源表设置 `changelog-producer`,然后使用 SeaTunnel 流任务读取。
196+
197+
### Note
198+
199+
目前,批读取总是读取最新的快照,如需读取更完整的 changelog 数据,需使用流读取,并在将数据写入 Paimon 表之前开始流读取,为了确保顺序,流读取任务并行度应该设置为 1。
200+
201+
### Streaming read 示例
202+
```hocon
203+
env {
204+
parallelism = 1
205+
job.mode = "Streaming"
206+
}
207+
208+
source {
209+
Paimon {
210+
warehouse = "/tmp/paimon"
211+
database = "full_type"
212+
table = "st_test"
213+
}
214+
}
215+
216+
sink {
217+
Paimon {
218+
warehouse = "/tmp/paimon"
219+
database = "full_type"
220+
table = "st_test_sink"
221+
paimon.table.primary-keys = "c_tinyint"
222+
}
223+
}
224+
```

0 commit comments

Comments
 (0)