Skip to content

Commit ae5d66f

Browse files
authored
Merge branch 'dev' into dev_wenjun_fixORC
2 parents 31ead24 + 01159ec commit ae5d66f

File tree

411 files changed

+14916
-1316
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

411 files changed

+14916
-1316
lines changed

Diff for: .github/workflows/backend.yml

+22-2
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ jobs:
100100
current_branch='${{ steps.git_init.outputs.branch }}'
101101
pip install GitPython
102102
workspace="${GITHUB_WORKSPACE}"
103+
repository_owner="${GITHUB_REPOSITORY_OWNER}"
103104
cv2_files=`python tools/update_modules_check/check_file_updates.py ua $workspace apache/dev origin/$current_branch "seatunnel-connectors-v2/**"`
104105
true_or_false=${cv2_files%%$'\n'*}
105106
file_list=${cv2_files#*$'\n'}
@@ -133,6 +134,9 @@ jobs:
133134
api_files=`python tools/update_modules_check/check_file_updates.py ua $workspace apache/dev origin/$current_branch "seatunnel-api/**" "seatunnel-common/**" "seatunnel-config/**" "seatunnel-connectors/**" "seatunnel-core/**" "seatunnel-e2e/seatunnel-e2e-common/**" "seatunnel-formats/**" "seatunnel-plugin-discovery/**" "seatunnel-transforms-v2/**" "seatunnel-translation/**" "seatunnel-e2e/seatunnel-transforms-v2-e2e/**" "seatunnel-connectors/**" "pom.xml" "**/workflows/**" "tools/**" "seatunnel-dist/**"`
134135
true_or_false=${api_files%%$'\n'*}
135136
file_list=${api_files#*$'\n'}
137+
if [[ $repository_owner == 'apache' ]];then
138+
true_or_false='true'
139+
fi
136140
echo "api=$true_or_false" >> $GITHUB_OUTPUT
137141
echo "api_files=$file_list" >> $GITHUB_OUTPUT
138142

@@ -304,6 +308,8 @@ jobs:
304308
java-version: ${{ matrix.java }}
305309
distribution: 'temurin'
306310
cache: 'maven'
311+
- name: free disk space
312+
run: tools/github/free_disk_space.sh
307313
- name: run updated modules integration test (part-1)
308314
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
309315
run: |
@@ -333,6 +339,8 @@ jobs:
333339
java-version: ${{ matrix.java }}
334340
distribution: 'temurin'
335341
cache: 'maven'
342+
- name: free disk space
343+
run: tools/github/free_disk_space.sh
336344
- name: run updated modules integration test (part-2)
337345
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
338346
run: |
@@ -393,6 +401,8 @@ jobs:
393401
java-version: ${{ matrix.java }}
394402
distribution: 'temurin'
395403
cache: 'maven'
404+
- name: free disk space
405+
run: tools/github/free_disk_space.sh
396406
- name: run updated modules integration test (part-4)
397407
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
398408
run: |
@@ -421,6 +431,8 @@ jobs:
421431
java-version: ${{ matrix.java }}
422432
distribution: 'temurin'
423433
cache: 'maven'
434+
- name: free disk space
435+
run: tools/github/free_disk_space.sh
424436
- name: run updated modules integration test (part-5)
425437
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
426438
run: |
@@ -449,6 +461,8 @@ jobs:
449461
java-version: ${{ matrix.java }}
450462
distribution: 'temurin'
451463
cache: 'maven'
464+
- name: free disk space
465+
run: tools/github/free_disk_space.sh
452466
- name: run updated modules integration test (part-6)
453467
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
454468
run: |
@@ -477,6 +491,8 @@ jobs:
477491
java-version: ${{ matrix.java }}
478492
distribution: 'temurin'
479493
cache: 'maven'
494+
- name: free disk space
495+
run: tools/github/free_disk_space.sh
480496
- name: run updated modules integration test (part-7)
481497
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
482498
run: |
@@ -506,6 +522,8 @@ jobs:
506522
java-version: ${{ matrix.java }}
507523
distribution: 'temurin'
508524
cache: 'maven'
525+
- name: free disk space
526+
run: tools/github/free_disk_space.sh
509527
- name: run updated modules integration test (part-8)
510528
if: needs.changes.outputs.api == 'false' && needs.changes.outputs.it-modules != ''
511529
run: |
@@ -538,7 +556,7 @@ jobs:
538556
- name: run seatunnel zeta integration test
539557
if: needs.changes.outputs.api == 'true'
540558
run: |
541-
./mvnw -T 1 -B verify -DskipUT=true -DskipIT=false -D"license.skipAddThirdParty"=true --no-snapshot-updates -pl :connector-seatunnel-e2e-base -am -Pci
559+
./mvnw -T 1 -B verify -DskipUT=true -DskipIT=false -D"license.skipAddThirdParty"=true --no-snapshot-updates -pl :connector-seatunnel-e2e-base,:connector-console-seatunnel-e2e -am -Pci
542560
env:
543561
MAVEN_OPTS: -Xmx4096m
544562
engine-k8s-it:
@@ -560,6 +578,8 @@ jobs:
560578
env:
561579
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
562580
- uses: actions/checkout@v2
581+
- name: free disk space
582+
run: tools/github/free_disk_space.sh
563583
- name: Set up JDK ${{ matrix.java }}
564584
uses: actions/setup-java@v3
565585
with:
@@ -977,7 +997,7 @@ jobs:
977997
java-version: ${{ matrix.java }}
978998
distribution: 'temurin'
979999
cache: 'maven'
980-
- name: run jdbc connectors integration test (part-6)
1000+
- name: run jdbc connectors integration test (part-7)
9811001
if: needs.changes.outputs.api == 'true'
9821002
run: |
9831003
./mvnw -B -T 1 verify -DskipUT=true -DskipIT=false -D"license.skipAddThirdParty"=true --no-snapshot-updates -pl :connector-jdbc-e2e-part-7 -am -Pci

Diff for: .idea/icon.png

207 KB
Loading

Diff for: config/jvm_options

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@
2727
-XX:MaxMetaspaceSize=2g
2828

2929
# G1GC
30-
-XX:+UseG1GC
30+
-XX:+UseG1GC

Diff for: docs/en/concept/event-listener.md

+116
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Event Listener
2+
3+
## Introduction
4+
5+
The SeaTunnel provides a rich event listening feature that allows you to manage the status at which data is synchronized.
6+
This functionality is crucial when you need to listen job running status(`org.apache.seatunnel.api.event`).
7+
This document will guide you through the usage of these parameters and how to leverage them effectively.
8+
9+
## Support Those Engines
10+
11+
> SeaTunnel Zeta<br/>
12+
> Flink<br/>
13+
> Spark<br/>
14+
15+
## API
16+
17+
The event API is defined in the `org.apache.seatunnel.api.event` package.
18+
19+
### Event Data API
20+
21+
- `org.apache.seatunnel.api.event.Event` - The interface for event data.
22+
- `org.apache.seatunnel.api.event.EventType` - The enum for event type.
23+
24+
### Event Listener API
25+
26+
You can customize event handler, such as sending events to external systems
27+
28+
- `org.apache.seatunnel.api.event.EventHandler` - The interface for event handler, SPI will automatically load subclass from the classpath.
29+
30+
### Event Collect API
31+
32+
- `org.apache.seatunnel.api.source.SourceSplitEnumerator` - Attached event listener API to report events from `SourceSplitEnumerator`.
33+
34+
```java
35+
package org.apache.seatunnel.api.source;
36+
37+
public interface SourceSplitEnumerator {
38+
39+
interface Context {
40+
41+
/**
42+
* Get the {@link org.apache.seatunnel.api.event.EventListener} of this enumerator.
43+
*
44+
* @return
45+
*/
46+
EventListener getEventListener();
47+
}
48+
}
49+
```
50+
51+
- `org.apache.seatunnel.api.source.SourceReader` - Attached event listener API to report events from `SourceReader`.
52+
53+
```java
54+
package org.apache.seatunnel.api.source;
55+
56+
public interface SourceReader {
57+
58+
interface Context {
59+
60+
/**
61+
* Get the {@link org.apache.seatunnel.api.event.EventListener} of this reader.
62+
*
63+
* @return
64+
*/
65+
EventListener getEventListener();
66+
}
67+
}
68+
```
69+
70+
- `org.apache.seatunnel.api.sink.SinkWriter` - Attached event listener API to report events from `SinkWriter`.
71+
72+
```java
73+
package org.apache.seatunnel.api.sink;
74+
75+
public interface SinkWriter {
76+
77+
interface Context {
78+
79+
/**
80+
* Get the {@link org.apache.seatunnel.api.event.EventListener} of this writer.
81+
*
82+
* @return
83+
*/
84+
EventListener getEventListener();
85+
}
86+
}
87+
```
88+
89+
## Configuration Listener
90+
91+
To use the event listening feature, you need to configure engine config.
92+
93+
### Zeta Engine
94+
95+
Example config in your config file(seatunnel.yaml):
96+
97+
```
98+
seatunnel:
99+
engine:
100+
event-report-http:
101+
url: "http://example.com:1024/event/report"
102+
headers:
103+
Content-Type: application/json
104+
```
105+
106+
### Flink Engine
107+
108+
You can define the implementation class of `org.apache.seatunnel.api.event.EventHandler` interface and add to the classpath to automatically load it through SPI.
109+
110+
Support flink version: 1.14.0+
111+
112+
Example: `org.apache.seatunnel.api.event.LoggingEventHandler`
113+
114+
### Spark Engine
115+
116+
You can define the implementation class of `org.apache.seatunnel.api.event.EventHandler` interface and add to the classpath to automatically load it through SPI.

Diff for: docs/en/concept/schema-feature.md

+1
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ columns = [
6464
| type | Yes | - | The data type of the column |
6565
| nullable | No | true | If the column can be nullable |
6666
| columnLength | No | 0 | The length of the column which will be useful when you need to define the length |
67+
| columnScale | No | - | The scale of the column which will be useful when you need to define the scale |
6768
| defaultValue | No | null | The default value of the column |
6869
| comment | No | null | The comment of the column |
6970

Diff for: docs/en/connector-v2/formats/debezium-json.md

+7
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,12 @@ The MySQL products table has 4 columns (id, name, description and weight).
6868
The above JSON message is an update change event on the products table where the weight value of the row with id = 111 is changed from 5.18 to 5.15.
6969
Assuming the messages have been synchronized to Kafka topic products_binlog, then we can use the following Seatunnel conf to consume this topic and interpret the change events by Debezium format.
7070

71+
**In this config, you must specify the `schema` and `debezium_record_include_schema` options **
72+
- `schema` should same with your table format
73+
- if your json data contains `schema` field, `debezium_record_include_schema` should be true, and if your json data doesn't contains `schema` field, `debezium_record_include_schema` should be false
74+
- `{"schema" : {}, "payload": { "before" : {}, "after": {} ... } }` --> `true`
75+
- `{"before" : {}, "after": {} ... }` --> `false`
76+
7177
```bash
7278
env {
7379
parallelism = 1
@@ -88,6 +94,7 @@ source {
8894
weight = "string"
8995
}
9096
}
97+
debezium_record_include_schema = false
9198
format = debezium_json
9299
}
93100

Diff for: docs/en/connector-v2/sink/CosFile.md

+6
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ By default, we use 2PC commit to ensure `exactly-once`
6161
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
6262
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
6363
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |
64+
| encoding | string | no | "UTF-8" | Only used when file_format_type is json,text,csv,xml. |
6465

6566
### path [string]
6667

@@ -205,6 +206,11 @@ Specifies the tag name of the data rows within the XML file.
205206

206207
Specifies Whether to process data using the tag attribute format.
207208

209+
### encoding [string]
210+
211+
Only used when file_format_type is json,text,csv,xml.
212+
The encoding of the file to write. This param will be parsed by `Charset.forName(encoding)`.
213+
208214
## Example
209215

210216
For text file format with `have_partition` and `custom_filename` and `sink_columns`

Diff for: docs/en/connector-v2/sink/Doris.md

+13-9
Original file line numberDiff line numberDiff line change
@@ -73,16 +73,20 @@ We use templates to automatically create Doris tables,
7373
which will create corresponding table creation statements based on the type of upstream data and schema type,
7474
and the default template can be modified according to the situation.
7575

76+
Default template:
77+
7678
```sql
77-
CREATE TABLE IF NOT EXISTS `${database}`.`${table_name}`
78-
(
79-
${rowtype_fields}
80-
) ENGINE = OLAP UNIQUE KEY (${rowtype_primary_key})
81-
DISTRIBUTED BY HASH (${rowtype_primary_key})
82-
PROPERTIES
83-
(
84-
"replication_num" = "1"
85-
);
79+
CREATE TABLE IF NOT EXISTS `${database}`.`${table_name}` (
80+
${rowtype_fields}
81+
) ENGINE=OLAP
82+
UNIQUE KEY (${rowtype_primary_key})
83+
DISTRIBUTED BY HASH (${rowtype_primary_key})
84+
PROPERTIES (
85+
"replication_allocation" = "tag.location.default: 1",
86+
"in_memory" = "false",
87+
"storage_format" = "V2",
88+
"disable_auto_compaction" = "false"
89+
)
8690
```
8791

8892
If a custom field is filled in the template, such as adding an `id` field

Diff for: docs/en/connector-v2/sink/FtpFile.md

+10-4
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ By default, we use 2PC commit to ensure `exactly-once`
3535
|----------------------------------|---------|----------|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
3636
| host | string | yes | - | |
3737
| port | int | yes | - | |
38-
| username | string | yes | - | |
38+
| user | string | yes | - | |
3939
| password | string | yes | - | |
4040
| path | string | yes | - | |
4141
| tmp_path | string | yes | /tmp/seatunnel | The result file will write to a tmp path first and then use `mv` to submit tmp dir to target dir. Need a FTP dir. |
@@ -60,6 +60,7 @@ By default, we use 2PC commit to ensure `exactly-once`
6060
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
6161
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
6262
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |
63+
| encoding | string | no | "UTF-8" | Only used when file_format_type is json,text,csv,xml. |
6364

6465
### host [string]
6566

@@ -69,7 +70,7 @@ The target ftp host is required
6970

7071
The target ftp port is required
7172

72-
### username [string]
73+
### user [string]
7374

7475
The target ftp username is required
7576

@@ -210,6 +211,11 @@ Specifies the tag name of the data rows within the XML file.
210211

211212
Specifies Whether to process data using the tag attribute format.
212213

214+
### encoding [string]
215+
216+
Only used when file_format_type is json,text,csv,xml.
217+
The encoding of the file to write. This param will be parsed by `Charset.forName(encoding)`.
218+
213219
## Example
214220

215221
For text file format simple config
@@ -219,7 +225,7 @@ For text file format simple config
219225
FtpFile {
220226
host = "xxx.xxx.xxx.xxx"
221227
port = 21
222-
username = "username"
228+
user = "username"
223229
password = "password"
224230
path = "/data/ftp"
225231
file_format_type = "text"
@@ -237,7 +243,7 @@ For text file format with `have_partition` and `custom_filename` and `sink_colum
237243
FtpFile {
238244
host = "xxx.xxx.xxx.xxx"
239245
port = 21
240-
username = "username"
246+
user = "username"
241247
password = "password"
242248
path = "/data/ftp/seatunnel/job1"
243249
tmp_path = "/data/ftp/seatunnel/tmp"

Diff for: docs/en/connector-v2/sink/HdfsFile.md

+1
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ Output data to hdfs file
6767
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml, specifies the tag name of the root element within the XML file. |
6868
| xml_row_tag | string | no | RECORD | Only used when file_format is xml, specifies the tag name of the data rows within the XML file |
6969
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml, specifies Whether to process data using the tag attribute format. |
70+
| encoding | string | no | "UTF-8" | Only used when file_format_type is json,text,csv,xml. |
7071

7172
### Tips
7273

0 commit comments

Comments
 (0)