Skip to content

Commit 891ab70

Browse files
jerryshaoLv, Qi
authored and
Lv, Qi
committed
Add HDP support
Conflicts: README.md
1 parent 1db38e4 commit 891ab70

File tree

3 files changed

+52
-37
lines changed

3 files changed

+52
-37
lines changed

README.md

+25-20
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
This benchmark suite contains 10 typical micro workloads. This benchmark suite also has options for users to enable input/output compression for most workloads with default compression codec (zlib). Some initial work based on this benchmark suite please refer to the included ICDE workshop paper (i.e., WISS10_conf_full_011.pdf).
2020

21-
Note:
21+
Note:
2222
1. Since HiBench-2.2, the input data of benchmarks are all automatically generated by their corresponding prepare scripts.
2323
2. Since HiBench-3.0, it introduces Yarn support
2424
3. Since HiBench-4.0, it consists of more workload implementations on both Hadoop MR and Spark. For Spark, three different APIs including Scala, Java, Python are supportive.
@@ -33,7 +33,7 @@ Note:
3333
2. WordCount (wordcount)
3434

3535
This workload counts the occurrence of each word in the input data, which are generated using RandomTextWriter. It is representative of another typical class of real world MapReduce jobs - extracting a small amount of interesting data from large data set.
36-
36+
3737
3. TeraSort (terasort)
3838

3939
TeraSort is a standard benchmark created by Jim Gray. Its input data is generated by Hadoop TeraGen example program.
@@ -53,7 +53,7 @@ Note:
5353
6. PageRank (pagerank)
5454

5555
This workload benchmarks PageRank algorithm implemented in Spark-MLLib/Hadoop (a search engine ranking benchmark included in pegasus 2.0) examples. The data source is generated from Web data whose hyperlinks follow the Zipfian distribution.
56-
56+
5757
7. Nutch indexing (nutchindexing)
5858

5959
Large-scale search indexing is one of the most significant uses of MapReduce. This workload tests the indexing sub-system in Nutch, a popular open source (Apache project) search engine. The workload uses the automatically generated Web data whose hyperlinks and words both follow the Zipfian distribution with corresponding parameters. The dict used to generate the Web page texts is the default linux dict file /usr/share/dict/linux.words.
@@ -75,13 +75,16 @@ Note:
7575
10. enhanced DFSIO (dfsioe)
7676

7777
Enhanced DFSIO tests the HDFS throughput of the Hadoop cluster by generating a large number of tasks performing writes and reads simultaneously. It measures the average I/O rate of each map task, the average throughput of each map task, and the aggregated throughput of HDFS cluster. Note: this benchmark doesn't have Spark corresponding implementation.
78-
78+
7979
**Supported hadoop/spark release:**
8080

8181
- Apache release of Hadoop 1.x and Hadoop 2.x
8282
- CDH4/CDH5 release of MR1 and MR2.
83+
- HDP2.3
8384
- Spark1.2
8485
- Spark1.3
86+
87+
Note : No version of CDH supports SparkSQL. Please download SparkSQL from Apache-spark official release page if you are using it.
8588

8689
---
8790
### Getting Started ###
@@ -93,39 +96,41 @@ Note:
9396
Download/checkout HiBench benchmark suite
9497

9598
Run `<HiBench_Root>/bin/build-all.sh` to build HiBench.
96-
99+
97100
Note: Begin from HiBench V4.0, HiBench will need python 2.x(>=2.6) .
98101

99102
2. HiBench Configurations.
100103

101104
For minimum requirements: create & edit `conf/99-user_defined_properties.conf`
102-
103-
cd conf
105+
106+
cd conf
104107
cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf
105-
108+
106109
And Make sure below properties has been set:
107110

108111
hibench.hadoop.home The Hadoop installation location
109112
hibench.spark.home The Spark installation location
110113
hibench.hdfs.master HDFS master
111114
hibench.spark.master SPARK master
112-
115+
113116
Note: For YARN mode, set `hibench.spark.master` to `yarn-client`. (`yarn-cluster` is not supported yet)
114117

118+
To run HiBench on HDP, please specify `hibench.hadoop.mapreduce.home` to the mapreduce home, normally it should be "/usr/hdp/current/hadoop-mapreduce-client". Also please specify `hibench.hadoop.release` to "hdp".
119+
115120
3. Run
116121

117122
Execute the `<HiBench_Root>/bin/run-all.sh` to run all workloads with all language APIs with `large` data scale.
118123

119124
4. View the report:
120-
125+
121126
Goto `<HiBench_Root>/report` to check for the final report:
122127
- `report/hibench.report`: Overall report about all workloads.
123128
- `report/<workload>/<language APIs>/bench.log`: Raw logs on client side.
124129
- `report/<workload>/<language APIs>/monitor.html`: System utilization monitor results.
125130
- `report/<workload>/<language APIs>/conf/<workload>.conf`: Generated environment variable configurations for this workload.
126131
- `report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf`: Generated configuration for this workloads, which is used for mapping to environment variable.
127132
- `report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf`: Generated configuration for spark.
128-
133+
129134
[Optional] Execute `<HiBench root>/bin/report_gen_plot.py report/hibench.report` to generate report figures.
130135

131136
Note: `report_gen_plot.py` requires `python2.x` and `python-matplotlib`.
@@ -135,12 +140,12 @@ Note:
135140

136141
1. Parallelism, memory, executor number tuning:
137142

138-
hibench.default.map.parallelism Mapper numbers in MR,
143+
hibench.default.map.parallelism Mapper numbers in MR,
139144
partition numbers in Spark
140-
hibench.default.shuffle.parallelism Reducer numbers in MR, shuffle
145+
hibench.default.shuffle.parallelism Reducer numbers in MR, shuffle
141146
partition numbers in Spark
142147
hibench.yarn.executors.num Number executors in YARN mode
143-
hibench.yarn.executors.cores Number executor cores in YARN mode
148+
hibench.yarn.executors.cores Number executor cores in YARN mode
144149
spark.executors.memory Executor memory, standalone or YARN mode
145150
spark.driver.memory Driver memory, standalone or YARN mode
146151

@@ -150,11 +155,11 @@ Note:
150155

151156
hibench.compress.profile Compression option `enable` or `disable`
152157
hibench.compress.codec.profile Compression codec, `snappy`, `lzo` or `default`
153-
158+
154159
3. Data scale profile selection:
155160

156161
hibench.scale.profile Data scale profile, `tiny`, `small`, `large`, `huge`, `gigantic`, `bigdata`
157-
162+
158163
You can add more data scale profiles in `conf/10-data-scale-profile.conf`. And please don't change `conf/00-default-properties.conf` if you have no confidence.
159164

160165
4. Configure for each workload or each language API:
@@ -166,7 +171,7 @@ Note:
166171
workloads/<workload>/<language APIs>/.../*.conf Configure for various languages
167172

168173
2. For configurations in same folder, the loading sequence will be
169-
sorted according to configure file name.
174+
sorted according to configure file name.
170175

171176
3. Values in latter configure will override former.
172177

@@ -189,7 +194,7 @@ Note:
189194
hibench.spark.version spark1.3
190195

191196
6. Configures for running workloads and language APIs:
192-
197+
193198
The `conf/benchmarks.lst` file under the package folder defines the
194199
workloads to run when you execute the `bin/run-all.sh` script under
195200
the package folder. Each line in the list file specifies one
@@ -227,7 +232,7 @@ Note:
227232
You'll need to install numpy (version > 1.4) in master & all slave nodes.
228233

229234
For CentOS(6.2+):
230-
235+
231236
`yum install numpy`
232237

233238
For Ubuntu/Debian:
@@ -239,7 +244,7 @@ Note:
239244
You'll need to install python-matplotlib(version > 0.9).
240245

241246
For CentOS(6.2+):
242-
247+
243248
`yum install python-matplotlib`
244249

245250
For Ubuntu/Debian:

bin/functions/load-config.py

+22-16
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ def generate_optional_value(): # get some critical values from environment or m
233233
"UNKNOWN"
234234
HibenchConfRef["hibench.hadoop.release"] = "Inferred by: hadoop version, which is:\"%s\"" % hadoop_version
235235

236-
assert HibenchConf["hibench.hadoop.release"] in ["cdh4", "cdh5", "apache"], "Unknown hadoop release. Auto probe failed, please override `hibench.hadoop.release` to explicitly define this property"
236+
assert HibenchConf["hibench.hadoop.release"] in ["cdh4", "cdh5", "apache", "hdp"], "Unknown hadoop release. Auto probe failed, please override `hibench.hadoop.release` to explicitly define this property"
237237

238238

239239
# probe spark version
@@ -260,18 +260,21 @@ def generate_optional_value(): # get some critical values from environment or m
260260
if not HibenchConf.get("hibench.hadoop.examples.jar", ""):
261261
if HibenchConf["hibench.hadoop.version"] == "hadoop1": # MR1
262262
if HibenchConf['hibench.hadoop.release'] == 'apache': # Apache release
263-
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home']+"/hadoop-examples*.jar")
264-
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/hadoop-examples*.jar"
263+
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home']+"/hadoop-examples*.jar")
264+
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/hadoop-examples*.jar"
265265
elif HibenchConf['hibench.hadoop.release'].startswith('cdh'): # CDH release
266-
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce1/hadoop-examples*.jar")
267-
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce1/hadoop-examples*.jar"
266+
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce1/hadoop-examples*.jar")
267+
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce1/hadoop-examples*.jar"
268268
else: # MR2
269269
if HibenchConf['hibench.hadoop.release'] == 'apache': # Apache release
270-
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home'] + "/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar")
271-
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar"
270+
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar")
271+
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar"
272272
elif HibenchConf['hibench.hadoop.release'].startswith('cdh'): # CDH release
273-
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home'] + "/share/hadoop/mapreduce2/hadoop-mapreduce-examples-*.jar")
274-
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce2/hadoop-mapreduce-examples-*.jar"
273+
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce2/hadoop-mapreduce-examples-*.jar")
274+
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce2/hadoop-mapreduce-examples-*.jar"
275+
elif HibenchConf['hibench.hadoop.release'].startswith('hdp'): # HDP release
276+
HibenchConf["hibench.hadoop.examples.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/hadoop-mapreduce-examples.jar")
277+
HibenchConfRef["hibench.hadoop.examples.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/hadoop-mapreduce-examples.jar"
275278

276279
# probe hadoop examples test jars (for sleep in hadoop2 only)
277280
if not HibenchConf.get("hibench.hadoop.examples.test.jar", ""):
@@ -280,15 +283,18 @@ def generate_optional_value(): # get some critical values from environment or m
280283
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Dummy value, not available in hadoop1"
281284
else:
282285
if HibenchConf['hibench.hadoop.release'] == 'apache':
283-
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home'] + "/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar")
284-
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar"
286+
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar")
287+
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar"
285288
elif HibenchConf['hibench.hadoop.release'].startswith('cdh'):
286289
if HibenchConf["hibench.hadoop.version"] == "hadoop2":
287-
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home'] + "/share/hadoop/mapreduce2/hadoop-mapreduce-client-jobclient*-tests.jar")
288-
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce2/hadoop-mapreduce-client-jobclient*-tests.jar"
290+
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce2/hadoop-mapreduce-client-jobclient*-tests.jar")
291+
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce2/hadoop-mapreduce-client-jobclient*-tests.jar"
289292
elif HibenchConf["hibench.hadoop.version"] == "hadoop1":
290-
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.home'] + "/share/hadoop/mapreduce1/hadoop-examples-*.jar")
291-
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.home']+"/share/hadoop/mapreduce1/hadoop-mapreduce-client-jobclient*-tests.jar"
293+
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce1/hadoop-examples-*.jar")
294+
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/share/hadoop/mapreduce1/hadoop-mapreduce-client-jobclient*-tests.jar"
295+
elif HibenchConf['hibench.hadoop.release'].startswith('hdp'): # HDP release
296+
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/hadoop-mapreduce-client-jobclient-tests.jar")
297+
HibenchConfRef["hibench.hadoop.examples.test.jar"]= "Inferred by: " + HibenchConf['hibench.hadoop.mapreduce.home']+"/hadoop-mapreduce-client-jobclient-tests.jar"
292298

293299
# set hibench.sleep.job.jar
294300
if not HibenchConf.get('hibench.sleep.job.jar', ''):
@@ -302,7 +308,7 @@ def generate_optional_value(): # get some critical values from environment or m
302308

303309
# probe hadoop configuration files
304310
if not HibenchConf.get("hibench.hadoop.configure.dir", ""):
305-
if HibenchConf["hibench.hadoop.release"] == "apache": # Apache release
311+
if HibenchConf["hibench.hadoop.release"] == "apache" or HibenchConf["hibench.hadoop.release"] == "hdp": # Apache and HDP release
306312
HibenchConf["hibench.hadoop.configure.dir"] = join(HibenchConf["hibench.hadoop.home"], "conf") if HibenchConf["hibench.hadoop.version"] == "hadoop1" \
307313
else join(HibenchConf["hibench.hadoop.home"], "etc", "hadoop")
308314
HibenchConfRef["hibench.hadoop.configure.dir"] = "Inferred by: 'hibench.hadoop.version' & 'hibench.hadoop.release'"

conf/00-default-properties.conf

+5-1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@
3737
# default hadoop executable path
3838
hibench.hadoop.executable ${hibench.hadoop.home}/bin/hadoop
3939

40+
# Hadoop MapReduce home dir, should be same to Hadoop home by default
41+
hibench.hadoop.mapreduce.home ${hibench.hadoop.home}
42+
4043
#======================================================
4144
# basic spark conf
4245
#======================================================
@@ -49,7 +52,7 @@ hibench.hadoop.configure.dir
4952
hibench.spark.version
5053
hibench.masters.hostnames
5154
hibench.slaves.hostnames
52-
hibench.dfsioe.map.java_opts
55+
hibench.dfsioe.map.java_opts
5356
hibench.dfsioe.red.java_opts
5457

5558
# default spark master if unspecified
@@ -128,6 +131,7 @@ hibench.pagerank.dir.name.output ${hibench.workload.dir.name.output}
128131
hibench.pagerank.pegasus.dir ${hibench.dependency.dir}/pegasus/target/pegasus-2.0-SNAPSHOT.jar
129132
hibench.mahout.home ${hibench.dependency.dir}/mahout/target/${hibench.mahout.release}
130133
hibench.mahout.release.apache mahout-distribution-0.9
134+
hibench.mahout.release.hdp mahout-distribution-0.9
131135
hibench.mahout.release.cdh4 mahout-0.7-cdh4.7.1
132136
hibench.mahout.release.cdh5 mahout-0.9-cdh5.1.0
133137
hibench.mahout.release ${hibench.mahout.release.${hibench.hadoop.release}}

0 commit comments

Comments
 (0)