File tree 1 file changed +34
-0
lines changed
1 file changed +34
-0
lines changed Original file line number Diff line number Diff line change @@ -15,6 +15,7 @@ Read files on Hdfs.
15
15
- ** input_path** file path on Hdfs. you can use glob and Date format like ` %Y%m%d/%s ` .
16
16
- ** rewind_seconds** When you use Date format in input_path property, the format is executed by using the time which is Now minus this property.
17
17
- ** partition** when this is true, partition input files and increase task count. (default: ` true ` )
18
+ - ** num_partitions** number of partitions. (default: ` Runtime.getRuntime().availableProcessors() ` )
18
19
19
20
## Example
20
21
32
33
input_path : /user/embulk/test/%Y-%m-%d/*
33
34
rewind_seconds : 86400
34
35
partition : true
36
+ num_partitions : 30
35
37
decoders :
36
38
- {type: gzip}
37
39
parser :
53
55
` ` `
54
56
55
57
## Note
58
+ - The parameter **num_partitions** is the approximate value. The actual num_partitions is larger than this parameter.
59
+ - see: [The Partitioning Logic](#partition_logic)
56
60
- the feature of the partition supports only 3 line terminators.
57
61
- ` \n`
58
62
- ` \r `
61
65
# # The Reference Implementation
62
66
- [hito4t/embulk-input-filesplit](https://github.com/hito4t/embulk-input-filesplit)
63
67
68
+ # #<a id="partition_logic">The Partitioning Logic</a>
69
+
70
+ ```
71
+ int partitionSizeByOneTask = totalFileLength / approximateNumPartitions;
72
+
73
+ /*
74
+ ...
75
+ * /
76
+
77
+ int numPartitions;
78
+ if (path.toString().endsWith(".gz") || path.toString().endsWith(".bz2") || path.toString().endsWith(".lzo")) {
79
+ // if the file is compressed, skip partitioning.
80
+ numPartitions = 1;
81
+ }
82
+ else if (!task.getPartition()) {
83
+ // if no partition mode, skip partitioning.
84
+ numPartitions = 1;
85
+ }
86
+ else {
87
+ // equalize the file size per task as much as possible.
88
+ numPartitions = ((fileLength - 1) / partitionSizeByOneTask) + 1;
89
+ }
90
+
91
+ /*
92
+ ...
93
+ * /
94
+
95
+ ```
96
+
97
+
64
98
## Build
65
99
66
100
```
You can’t perform that action at this time.
0 commit comments