You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/connectors/source/FtpFile.md
+69-1Lines changed: 69 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,6 +76,9 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
76
76
| null_format | string | no | - |
77
77
| binary_chunk_size | int | no | 1024 |
78
78
| binary_complete_file_mode | boolean | no | false |
79
+
| discovery_mode | string | no | once |
80
+
| scan_interval | string | no | 10S |
81
+
| start_mode | string | no | earliest |
79
82
| sync_mode | string | no | full |
80
83
| target_path | string | no | - |
81
84
| target_hadoop_conf | map | no | - |
@@ -452,6 +455,26 @@ Only used when file_format_type is binary.
452
455
453
456
Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.
-`once`: enumerate current files once and finish (bounded).
463
+
-`continuous`: keep scanning the path and processing new/changed files at runtime (unbounded).
464
+
465
+
In the current implementation, `discovery_mode=continuous` requires `sync_mode=update` (binary only) to avoid repeated transfers.
466
+
467
+
### scan_interval [string]
468
+
469
+
Only used when `discovery_mode=continuous`. Scan interval for periodic discovery; value must be greater than `0`. Recommended shorthand format `10S`, `30S` (case-insensitive, e.g. `10s`); ISO-8601 format `PT10S`, `PT30S` is also supported. Default is `10S`.
470
+
471
+
### start_mode [string]
472
+
473
+
Only used when `discovery_mode=continuous`. Supported values: `earliest` (default), `latest`.
474
+
475
+
-`earliest`: read existing files on startup.
476
+
-`latest`: only process files modified after the job starts.
`discovery_mode=continuous` keeps the job running and periodically scans the path for new/changed files (long-running job, recommended to run with `job.mode="STREAMING"`).
698
+
699
+
**Note:**`discovery_mode=continuous` currently requires `sync_mode="update"` (binary-only) to avoid repeated transfers without keeping an unbounded "seen" state. `target_path` should align with the sink `path` on the same filesystem.
Copy file name to clipboardExpand all lines: docs/en/connectors/source/HdfsFile.md
+96Lines changed: 96 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,9 @@ Read data from hdfs file system.
80
80
| null_format | string | no | - | Only used when file_format_type is text. null_format to define which strings can be represented as null. e.g: `\N`|
81
81
| binary_chunk_size | int | no | 1024 | Only used when file_format_type is binary. The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory. |
82
82
| binary_complete_file_mode | boolean | no | false | Only used when file_format_type is binary. Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false. |
83
+
| discovery_mode | string | no | once | File discovery mode. Supported values: `once` (default), `continuous`. When `continuous`, the source keeps scanning the path and processes new/changed files at runtime (unbounded). In the current implementation, `continuous` requires `sync_mode=update` (binary only). |
84
+
| scan_interval | string | no | 10S | Only used when `discovery_mode=continuous`. Scan interval for periodic discovery, recommended shorthand format `10S`, `30S`; ISO-8601 format `PT10S`, `PT30S` is also supported. |
85
+
| start_mode | string | no | earliest | Only used when `discovery_mode=continuous`. Supported values: `earliest` (default), `latest`. |
83
86
| sync_mode | string | no | full | File sync mode. Supported values: `full`, `update`. When `update`, the source compares files between source/target and only reads new/changed files (currently only supports `file_format_type=binary`). |
84
87
| target_path | string | no | - | Only used when `sync_mode=update`. Target base path used for comparison (it should usually be the same as sink `path`). |
85
88
| target_hadoop_conf | map | no | - | Only used when `sync_mode=update`. Extra Hadoop configuration for target filesystem. You can set `fs.defaultFS` in this map to override target defaultFS. |
@@ -220,6 +223,26 @@ Only used when file_format_type is binary.
220
223
221
224
Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.
-`once`: enumerate current files once and finish (bounded).
231
+
-`continuous`: keep scanning the path and processing new/changed files at runtime (unbounded).
232
+
233
+
In the current implementation, `discovery_mode=continuous` requires `sync_mode=update` (binary only) to avoid repeated transfers.
234
+
235
+
### scan_interval [string]
236
+
237
+
Only used when `discovery_mode=continuous`. Scan interval for periodic discovery; value must be greater than `0`. Recommended shorthand format `10S`, `30S` (case-insensitive, e.g. `10s`); ISO-8601 format `PT10S`, `PT30S` is also supported. Default is `10S`.
238
+
239
+
### start_mode [string]
240
+
241
+
Only used when `discovery_mode=continuous`. Supported values: `earliest` (default), `latest`.
242
+
243
+
-`earliest`: read existing files on startup.
244
+
-`latest`: only process files modified after the job starts.
`sync_mode=update` compares files between source and `target_path`, then only reads new/changed files (currently only supports `file_format_type=binary`).
367
+
In most cases, `target_path` should be aligned with sink `path` (same filesystem and same relative paths).
`discovery_mode=continuous` keeps the job running and periodically scans the path for new/changed files (long-running job, recommended to run with `job.mode="STREAMING"`).
401
+
402
+
**Note:**`discovery_mode=continuous` currently requires `sync_mode="update"` (binary-only) to avoid repeated transfers without keeping an unbounded "seen" state. `target_path` should align with the sink `path` on the same filesystem.
0 commit comments