You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ Madoop: Michigan Hadoop
7
7
8
8
Michigan Hadoop (`madoop`) is a light weight MapReduce framework for education. Madoop implements the [Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html) interface. Madoop is implemented in Python and runs on a single machine.
9
9
10
-
For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our [Hadoop Streaming tutorial](README_hadoop_streaming.md).
10
+
For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our [Hadoop Streaming tutorial](README_Hadoop_Streaming.md).
Copy file name to clipboardExpand all lines: README_Hadoop_Streaming.md
+23-17Lines changed: 23 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,12 +22,11 @@ example
22
22
23
23
Execute the example MapReduce program using Madoop and show the output.
24
24
```console
25
-
$ cd example
26
25
$ madoop \
27
-
-input input \
28
-
-output output \
29
-
-mapper map.py \
30
-
-reducer reduce.py
26
+
-input example/input \
27
+
-output example/output \
28
+
-mapper example/map.py \
29
+
-reducer example/reduce.py
31
30
32
31
$ cat output/part-*
33
32
Goodbye 1
@@ -40,6 +39,9 @@ Hello 2
40
39
## Overview
41
40
[Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html) is a MapReduce API that works with any programming language. The mapper and the reducer are executables that read input from stdin and write output to stdout.
42
41
42
+
## Partition
43
+
The MapReduce framework begins by partitioning (splitting) the input. If the input size is large, a real MapReduce framework will break it up into smaller chunks. Each Map execution will process one input partition. In this tutorial, we're faking MapReduce at the command line with a single mapper, so we'll skip the partition step.
44
+
43
45
## Map
44
46
The mapper is an executable that reads input from stdin and writes output to stdout. Here's an example `map.py` which is part of a word count MapReduce program.
45
47
```python
@@ -109,7 +111,7 @@ def main():
109
111
word, _, count = line.partition("\t")
110
112
word_count[word] +=int(count)
111
113
for word, count in word_count.items():
112
-
print(f"{word}\t{count}")
114
+
print(f"{word}{count}")
113
115
114
116
if__name__=="__main__":
115
117
main()
@@ -151,9 +153,9 @@ The reduce output format is up to the programmer. Here's how to run the whole w
0 commit comments