|
1 | 1 | Madoop: Michigan Hadoop |
2 | 2 | ======================= |
3 | 3 |
|
4 | | -Michigan Hadoop (`madoop`) is a light weight MapReduce framework for education. Madoop implements the [Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html) interface. Madoop is implemented in Python and runs on a single machine. |
| 4 | +[](https://pypi.org/project/madoop/) |
| 5 | +[](https://github.com/eecs485staff/madoop/actions?query=branch%3Adevelop) |
| 6 | +[](https://codecov.io/gh/eecs485staff/madoop) |
5 | 7 |
|
6 | | -## Quick start |
7 | | -Install and run an example word count MapReduce program. |
8 | | -```console |
9 | | -$ pip install madoop |
10 | | -$ madoop \ |
11 | | - -input example/input \ |
12 | | - -output output \ |
13 | | - -mapper example/map.py \ |
14 | | - -reducer example/reduce.py |
15 | | -$ cat output/part-* |
16 | | -autograder 2 |
17 | | -world 1 |
18 | | -eecs485 1 |
19 | | -goodbye 1 |
20 | | -hello 3 |
21 | | -``` |
| 8 | +Michigan Hadoop (`madoop`) is a light weight MapReduce framework for education. Madoop implements the [Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html) interface. Madoop is implemented in Python and runs on a single machine. |
22 | 9 |
|
| 10 | +For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our [Hadoop Streaming tutorial](README_hadoop_streaming.md). |
23 | 11 |
|
24 | | -## Example |
25 | | -We'll walk through the example in the Quick Start again, providing more detail. For an in-depth explanation of the map and reduce code, see the [Hadoop Streaming tutorial](https://eecs485staff.github.io/p5-search-engine/hadoop_streaming.html). |
26 | 12 |
|
27 | | -## Install |
28 | | -Install Madoop. Your version might be different. |
| 13 | +## Quick start |
| 14 | +Install Madoop. |
29 | 15 | ```console |
30 | 16 | $ pip install madoop |
31 | | -$ madoop --version |
32 | | -Madoop 0.1.0 |
33 | 17 | ``` |
34 | 18 |
|
35 | | -### Input |
36 | | -We've provided two small input files. |
| 19 | +Create example MapReduce program with input files. |
37 | 20 | ```console |
38 | | -$ cat example/input/input01.txt |
39 | | -hello world |
40 | | -hello eecs485 |
41 | | -$ cat example/input/input02.txt |
42 | | -goodbye autograder |
43 | | -hello autograder |
| 21 | +$ madoop --example |
| 22 | +$ tree example |
| 23 | +example |
| 24 | +├── input |
| 25 | +│ ├── input01.txt |
| 26 | +│ └── input02.txt |
| 27 | +├── map.py |
| 28 | +└── reduce.py |
44 | 29 | ``` |
45 | 30 |
|
46 | | -### Run |
47 | | -Run a MapReduce word count job. By default, there will be one mapper for each input file. Large input files maybe segmented and processed by multiple mappers. |
48 | | -- `-input DIRECTORY` input directory |
49 | | -- `-output DIRECTORY` output directory |
50 | | -- `-mapper FILE` mapper executable |
51 | | -- `-reducer FILE` reducer executable |
| 31 | +Run example word count MapReduce program. |
52 | 32 | ```console |
53 | 33 | $ madoop \ |
54 | | - -input example/input \ |
55 | | - -output output \ |
56 | | - -mapper example/map.py \ |
57 | | - -reducer example/reduce.py |
| 34 | + -input example/input \ |
| 35 | + -output example/output \ |
| 36 | + -mapper example/map.py \ |
| 37 | + -reducer example/reduce.py |
58 | 38 | ``` |
59 | 39 |
|
60 | | -### Output |
61 | | -Concatenate and print output. The concatenation of multiple output files may not be sorted. |
| 40 | +Concatenate and print the output. |
62 | 41 | ```console |
63 | | -$ ls output |
64 | | -part-00000 part-00001 part-00002 part-00003 |
65 | | -$ cat output/part-* |
66 | | -autograder 2 |
67 | | -world 1 |
68 | | -eecs485 1 |
69 | | -goodbye 1 |
70 | | -hello 3 |
| 42 | +$ cat example/output/part-* |
| 43 | +Goodbye 1 |
| 44 | +Bye 1 |
| 45 | +Hadoop 2 |
| 46 | +World 2 |
| 47 | +Hello 2 |
71 | 48 | ``` |
72 | 49 |
|
73 | 50 | ## Comparison with Apache Hadoop and CLI |
|
0 commit comments