Skip to content

Commit b8cd5a2

Browse files
committed
Merge remote-tracking branch 'origin/develop'
2 parents 7ddb070 + 1b46faf commit b8cd5a2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+826
-310
lines changed

.github/workflows/continuous_integration.yml

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ name: CI
44
# Define conditions for when to run this action
55
on:
66
pull_request: # Run on all pull requests
7-
push: # Run on all pushes to main
7+
push: # Run on all pushes to main or develop
88
branches:
99
- main
1010
- develop
@@ -49,12 +49,14 @@ jobs:
4949
# https://github.com/ymyzk/tox-gh-actions#workflow-configuration
5050
- name: Run tests
5151
run: tox
52-
# - name: Combine coverage
53-
# run: coverage xml
52+
53+
# Combine coverage data from all test executions
54+
- name: Combine coverage
55+
run: coverage xml
5456

5557
# Upload coverage report
5658
# https://github.com/codecov/codecov-action
57-
# - name: Upload coverage report
58-
# uses: codecov/codecov-action@v1
59-
# with:
60-
# fail_ci_if_error: true
59+
- name: Upload coverage report
60+
uses: codecov/codecov-action@v1
61+
with:
62+
fail_ci_if_error: true

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
*.pyc
33
__pycache__
44

5+
# Example input
6+
/example/
7+
58
# Python virtual environment
69
/env/
710
/env2/
@@ -32,6 +35,3 @@ build/
3235

3336
# macOS system files
3437
*.DS_Store
35-
36-
# Output directories
37-
/output

CONTRIBUTING.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,11 @@ $ git describe
7878
X.Y.Z
7979
$ git push --tags origin main
8080
```
81+
82+
Create a release on GitHub using the "Auto-generate release notes" feature. https://github.com/eecs485staff/madoop/releases/new
83+
84+
Upload to PyPI
85+
```console
86+
$ python3 setup.py sdist bdist_wheel
87+
$ twine upload --sign dist/*
88+
```

MANIFEST.in

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
include LICENSE
22
include MANIFEST.in
33
include README.md
4+
include README_Hadoop_Streaming.md
45
include CONTRIBUTING.md
56
include .pylintrc
67
graft tests
7-
graft example
8+
graft madoop/example
89

910
# Avoid dev and and binary files
1011
exclude tox.ini

README.md

Lines changed: 28 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,50 @@
11
Madoop: Michigan Hadoop
22
=======================
33

4-
Michigan Hadoop (`madoop`) is a light weight MapReduce framework for education. Madoop implements the [Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html) interface. Madoop is implemented in Python and runs on a single machine.
4+
[![PyPI](https://img.shields.io/pypi/v/madoop.svg)](https://pypi.org/project/madoop/)
5+
[![CI main](https://github.com/eecs485staff/madoop/workflows/CI/badge.svg?branch=develop)](https://github.com/eecs485staff/madoop/actions?query=branch%3Adevelop)
6+
[![codecov](https://codecov.io/gh/eecs485staff/madoop/branch/develop/graph/badge.svg)](https://codecov.io/gh/eecs485staff/madoop)
57

6-
## Quick start
7-
Install and run an example word count MapReduce program.
8-
```console
9-
$ pip install madoop
10-
$ madoop \
11-
-input example/input \
12-
-output output \
13-
-mapper example/map.py \
14-
-reducer example/reduce.py
15-
$ cat output/part-*
16-
autograder 2
17-
world 1
18-
eecs485 1
19-
goodbye 1
20-
hello 3
21-
```
8+
Michigan Hadoop (`madoop`) is a light weight MapReduce framework for education. Madoop implements the [Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html) interface. Madoop is implemented in Python and runs on a single machine.
229

10+
For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our [Hadoop Streaming tutorial](README_hadoop_streaming.md).
2311

24-
## Example
25-
We'll walk through the example in the Quick Start again, providing more detail. For an in-depth explanation of the map and reduce code, see the [Hadoop Streaming tutorial](https://eecs485staff.github.io/p5-search-engine/hadoop_streaming.html).
2612

27-
## Install
28-
Install Madoop. Your version might be different.
13+
## Quick start
14+
Install Madoop.
2915
```console
3016
$ pip install madoop
31-
$ madoop --version
32-
Madoop 0.1.0
3317
```
3418

35-
### Input
36-
We've provided two small input files.
19+
Create example MapReduce program with input files.
3720
```console
38-
$ cat example/input/input01.txt
39-
hello world
40-
hello eecs485
41-
$ cat example/input/input02.txt
42-
goodbye autograder
43-
hello autograder
21+
$ madoop --example
22+
$ tree example
23+
example
24+
├── input
25+
│   ├── input01.txt
26+
│   └── input02.txt
27+
├── map.py
28+
└── reduce.py
4429
```
4530

46-
### Run
47-
Run a MapReduce word count job. By default, there will be one mapper for each input file. Large input files maybe segmented and processed by multiple mappers.
48-
- `-input DIRECTORY` input directory
49-
- `-output DIRECTORY` output directory
50-
- `-mapper FILE` mapper executable
51-
- `-reducer FILE` reducer executable
31+
Run example word count MapReduce program.
5232
```console
5333
$ madoop \
54-
-input example/input \
55-
-output output \
56-
-mapper example/map.py \
57-
-reducer example/reduce.py
34+
-input example/input \
35+
-output example/output \
36+
-mapper example/map.py \
37+
-reducer example/reduce.py
5838
```
5939

60-
### Output
61-
Concatenate and print output. The concatenation of multiple output files may not be sorted.
40+
Concatenate and print the output.
6241
```console
63-
$ ls output
64-
part-00000 part-00001 part-00002 part-00003
65-
$ cat output/part-*
66-
autograder 2
67-
world 1
68-
eecs485 1
69-
goodbye 1
70-
hello 3
42+
$ cat example/output/part-*
43+
Goodbye 1
44+
Bye 1
45+
Hadoop 2
46+
World 2
47+
Hello 2
7148
```
7249

7350
## Comparison with Apache Hadoop and CLI

0 commit comments

Comments
 (0)