[Feature][Connector-V2][HdfsFile] Support true large-file split for parallel read (byte-range/row-delimiter + Parquet RowGroup)

### Search before asking

- [x] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.


### Description

Currently connector-file-hadoop’s HdfsFile source still uses the default split behavior: one file -> one split. When the number of files is small but a single file is huge (tens of GB), the read parallelism cannot scale, so the job effectively reads with single concurrency.

connector-file-local already added large-file splitting support in PR https://github.com/apache/seatunnel/pull/10142 (select split strategy by config: row-delimiter split for Text/CSV/JSON, RowGroup split for Parquet). However, HdfsFile is not covered

### Usage Scenario

1. Ingest single / few extremely large files (CSV / plain log / NDJSON, tens of GB) stored in HDFS.
2. Current behavior: only one split is generated per file, so only one reader does work even if env.parallelism is high.
3. Expected behavior: when enable_file_split=true, split the large file into multiple splits and read in parallel:

-   Text/CSV/JSON: split by file_split_size and align to row_delimiter (no broken lines, no duplicates/missing).
-   Parquet: split by RowGroup (each RowGroup as a split, or pack RowGroups by size).

### Related issues

https://github.com/apache/seatunnel/issues/10129

### Are you willing to submit a PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature][Connector-V2][HdfsFile] Support true large-file split for parallel read (byte-range/row-delimiter + Parquet RowGroup) #10326

Search before asking

Description

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature][Connector-V2][HdfsFile] Support true large-file split for parallel read (byte-range/row-delimiter + Parquet RowGroup) #10326

Description

Search before asking

Description

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions