Skip to content

Commit a06800e

Browse files
authored
[Feature][Connector] Add Apache Cloudberry Support (#8985)
1 parent aae9ca3 commit a06800e

File tree

5 files changed

+598
-0
lines changed

5 files changed

+598
-0
lines changed
+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<details><summary> Change Log </summary>
2+
3+
| Change | Commit | Version |
4+
| --- | --- | --- |
5+
|[Feature][Connector] Add Apache Cloudberry Support (#8985)|https://github.com/apache/seatunnel/commit/b6f82c1|dev|
6+
7+
</details>

Diff for: docs/en/connector-v2/sink/Cloudberry.md

+176
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
import ChangeLog from '../changelog/connector-cloudberry.md';
2+
3+
# Cloudberry
4+
5+
> JDBC Cloudberry Sink Connector
6+
7+
## Support Those Engines
8+
9+
> Spark<br/>
10+
> Flink<br/>
11+
> SeaTunnel Zeta<br/>
12+
13+
## Description
14+
15+
Write data through JDBC. Cloudberry currently does not have its own native driver. It uses PostgreSQL's driver for connectivity and follows PostgreSQL's implementation.
16+
17+
Support Batch mode and Streaming mode, support concurrent writing, support exactly-once
18+
semantics (using XA transaction guarantee).
19+
20+
## Using Dependency
21+
22+
### For Spark/Flink Engine
23+
24+
> 1. You need to ensure that the [jdbc driver jar package](https://mvnrepository.com/artifact/org.postgresql/postgresql) has been placed in directory `${SEATUNNEL_HOME}/plugins/`.
25+
26+
### For SeaTunnel Zeta Engine
27+
28+
> 1. You need to ensure that the [jdbc driver jar package](https://mvnrepository.com/artifact/org.postgresql/postgresql) has been placed in directory `${SEATUNNEL_HOME}/lib/`.
29+
30+
## Key Features
31+
32+
- [x] [exactly-once](../../concept/connector-v2-features.md)
33+
- [x] [cdc](../../concept/connector-v2-features.md)
34+
35+
> Use `Xa transactions` to ensure `exactly-once`. So only support `exactly-once` for the database which is
36+
> support `Xa transactions`. You can set `is_exactly_once=true` to enable it.
37+
38+
## Supported DataSource Info
39+
40+
| Datasource | Supported Versions | Driver | Url | Maven |
41+
|------------|------------------------------------------|------------------------|---------------------------------------|--------------------------------------------------------------------------|
42+
| Cloudberry | Uses PostgreSQL driver implementation | org.postgresql.Driver | jdbc:postgresql://localhost:5432/test | [Download](https://mvnrepository.com/artifact/org.postgresql/postgresql) |
43+
44+
## Database Dependency
45+
46+
> Please download the PostgreSQL driver jar and copy it to the '$SEATUNNEL_HOME/plugins/jdbc/lib/' working directory<br/>
47+
> For example: cp postgresql-xxx.jar $SEATUNNEL_HOME/plugins/jdbc/lib/
48+
49+
## Data Type Mapping
50+
51+
Cloudberry uses PostgreSQL's data type implementation. Please refer to PostgreSQL documentation for data type compatibility and mappings.
52+
53+
## Options
54+
55+
Cloudberry connector uses the same options as PostgreSQL. For detailed configuration options, please refer to the PostgreSQL documentation.
56+
57+
Key options include:
58+
- url (required): The JDBC connection URL
59+
- driver (required): The driver class name (org.postgresql.Driver)
60+
- user/password: Authentication credentials
61+
- query or database/table combination: What data to write and how
62+
- is_exactly_once: Enable exactly-once semantics with XA transactions
63+
- batch_size: Control batch writing behavior
64+
65+
## Task Example
66+
67+
### Simple:
68+
69+
```hocon
70+
env {
71+
parallelism = 1
72+
job.mode = "BATCH"
73+
}
74+
75+
source {
76+
FakeSource {
77+
parallelism = 1
78+
plugin_output = "fake"
79+
row.num = 16
80+
schema = {
81+
fields {
82+
name = "string"
83+
age = "int"
84+
}
85+
}
86+
}
87+
}
88+
89+
sink {
90+
jdbc {
91+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
92+
driver = "org.postgresql.Driver"
93+
user = "dbadmin"
94+
password = "password"
95+
query = "insert into test_table(name,age) values(?,?)"
96+
}
97+
}
98+
```
99+
100+
### Generate Sink SQL
101+
102+
```hocon
103+
sink {
104+
Jdbc {
105+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
106+
driver = "org.postgresql.Driver"
107+
user = "dbadmin"
108+
password = "password"
109+
110+
generate_sink_sql = true
111+
database = "mydb"
112+
table = "public.test_table"
113+
}
114+
}
115+
```
116+
117+
### Exactly-once:
118+
119+
```hocon
120+
sink {
121+
jdbc {
122+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
123+
driver = "org.postgresql.Driver"
124+
user = "dbadmin"
125+
password = "password"
126+
query = "insert into test_table(name,age) values(?,?)"
127+
128+
is_exactly_once = "true"
129+
xa_data_source_class_name = "org.postgresql.xa.PGXADataSource"
130+
}
131+
}
132+
```
133+
134+
### CDC(Change Data Capture) Event
135+
136+
```hocon
137+
sink {
138+
jdbc {
139+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
140+
driver = "org.postgresql.Driver"
141+
user = "dbadmin"
142+
password = "password"
143+
144+
generate_sink_sql = true
145+
database = "mydb"
146+
table = "sink_table"
147+
primary_keys = ["id","name"]
148+
field_ide = UPPERCASE
149+
}
150+
}
151+
```
152+
153+
### Save mode function
154+
155+
```hocon
156+
sink {
157+
Jdbc {
158+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
159+
driver = "org.postgresql.Driver"
160+
user = "dbadmin"
161+
password = "password"
162+
163+
generate_sink_sql = true
164+
database = "mydb"
165+
table = "public.test_table"
166+
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
167+
data_save_mode = "APPEND_DATA"
168+
}
169+
}
170+
```
171+
172+
For more detailed examples and options, please refer to the PostgreSQL connector documentation.
173+
174+
## Changelog
175+
176+
<ChangeLog />

Diff for: docs/en/connector-v2/source/Cloudberry.md

+152
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
import ChangeLog from '../changelog/connector-cloudberry.md';
2+
3+
# Cloudberry
4+
5+
> JDBC Cloudberry Source Connector
6+
7+
## Support Those Engines
8+
9+
> Spark<br/>
10+
> Flink<br/>
11+
> SeaTunnel Zeta<br/>
12+
13+
## Using Dependency
14+
15+
### For Spark/Flink Engine
16+
17+
> 1. You need to ensure that the [jdbc driver jar package](https://mvnrepository.com/artifact/org.postgresql/postgresql) has been placed in directory `${SEATUNNEL_HOME}/plugins/`.
18+
19+
### For SeaTunnel Zeta Engine
20+
21+
> 1. You need to ensure that the [jdbc driver jar package](https://mvnrepository.com/artifact/org.postgresql/postgresql) has been placed in directory `${SEATUNNEL_HOME}/lib/`.
22+
23+
## Key Features
24+
25+
- [x] [batch](../../concept/connector-v2-features.md)
26+
- [ ] [stream](../../concept/connector-v2-features.md)
27+
- [x] [exactly-once](../../concept/connector-v2-features.md)
28+
- [x] [column projection](../../concept/connector-v2-features.md)
29+
- [x] [parallelism](../../concept/connector-v2-features.md)
30+
- [x] [support user-defined split](../../concept/connector-v2-features.md)
31+
32+
> supports query SQL and can achieve projection effect.
33+
34+
## Description
35+
36+
Read external data source data through JDBC. Cloudberry currently does not have its own native JDBC driver, using PostgreSQL's drivers and implementation.
37+
38+
## Supported DataSource Info
39+
40+
| Datasource | Supported Versions | Driver | Url | Maven |
41+
|------------|------------------------------------------|------------------------|---------------------------------------|--------------------------------------------------------------------------|
42+
| Cloudberry | Uses PostgreSQL driver implementation | org.postgresql.Driver | jdbc:postgresql://localhost:5432/test | [Download](https://mvnrepository.com/artifact/org.postgresql/postgresql) |
43+
44+
## Database Dependency
45+
46+
> Please download the PostgreSQL driver jar and copy it to the '$SEATUNNEL_HOME/plugins/jdbc/lib/' working directory<br/>
47+
> For example: cp postgresql-xxx.jar $SEATUNNEL_HOME/plugins/jdbc/lib/
48+
49+
## Data Type Mapping
50+
51+
Cloudberry uses PostgreSQL's data type implementation. Please refer to PostgreSQL documentation for data type compatibility and mappings.
52+
53+
## Options
54+
55+
Cloudberry connector uses the same options as PostgreSQL. For detailed configuration options, please refer to the PostgreSQL documentation.
56+
57+
Key options include:
58+
- url (required): The JDBC connection URL
59+
- driver (required): The driver class name (org.postgresql.Driver)
60+
- user/password: Authentication credentials
61+
- query or table_path: What data to read
62+
- partition options for parallel reading
63+
64+
## Parallel Reader
65+
66+
Cloudberry supports parallel reading following the same rules as PostgreSQL connector. For detailed information on split strategies and parallel reading options, please refer to the PostgreSQL connector documentation.
67+
68+
## Task Example
69+
70+
### Simple:
71+
72+
```hocon
73+
env {
74+
parallelism = 4
75+
job.mode = "BATCH"
76+
}
77+
78+
source {
79+
Jdbc {
80+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
81+
driver = "org.postgresql.Driver"
82+
user = "dbadmin"
83+
password = "password"
84+
query = "select * from mytable limit 100"
85+
}
86+
}
87+
88+
sink {
89+
Console {}
90+
}
91+
```
92+
93+
### Parallel reading with table_path:
94+
95+
```hocon
96+
env {
97+
parallelism = 4
98+
job.mode = "BATCH"
99+
}
100+
101+
source {
102+
Jdbc {
103+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
104+
driver = "org.postgresql.Driver"
105+
user = "dbadmin"
106+
password = "password"
107+
table_path = "public.mytable"
108+
split.size = 10000
109+
}
110+
}
111+
112+
sink {
113+
Console {}
114+
}
115+
```
116+
117+
### Multiple table read:
118+
119+
```hocon
120+
env {
121+
job.mode = "BATCH"
122+
parallelism = 4
123+
}
124+
125+
source {
126+
Jdbc {
127+
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
128+
driver = "org.postgresql.Driver"
129+
user = "dbadmin"
130+
password = "password"
131+
"table_list" = [
132+
{
133+
"table_path" = "public.table1"
134+
},
135+
{
136+
"table_path" = "public.table2"
137+
}
138+
]
139+
split.size = 10000
140+
}
141+
}
142+
143+
sink {
144+
Console {}
145+
}
146+
```
147+
148+
For more detailed examples and configurations, please refer to the PostgreSQL connector documentation.
149+
150+
## Changelog
151+
152+
<ChangeLog />

0 commit comments

Comments
 (0)