Skip to content

Commit 76a96ba

Browse files
committed
Added minor doc in readme
1 parent 458c19f commit 76a96ba

File tree

1 file changed

+194
-0
lines changed

1 file changed

+194
-0
lines changed

README.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,197 @@ This software depends in:
1414
- [pakozm/luamongo](https://github.com/pakozm/luamongo/), a fork of
1515
[moai/luamongo](https://github.com/moai/luamongo) for Lua 5.2 and with minor
1616
improvements.
17+
18+
Installation
19+
------------
20+
21+
Copy the `mapreduce` directory to a place visible from your `LUA_PATH`
22+
environment variable. In the same way, in order to test the example, you need to
23+
put the `examples` directory visible through your `LUA_PATH`. It is possible to
24+
add the active directory by writing in the terminal:
25+
26+
```
27+
$ export LUA_PATH='?.lua;?/init.lua'
28+
```
29+
30+
Usage
31+
-----
32+
33+
Two Lua scripts have been prepared for fast running of the software.
34+
35+
- `execute_server.lua` runs the master server for your map-reduce operation.
36+
Only **one instance** of this script is needed. Note that this software
37+
receives the **map-reduce task** splitted into several Lua modules. These
38+
modules had to be visible in the `LUA_PATH` of the server and all the workers
39+
that you execute. This script receives 7 mandatory arguments:
40+
41+
1. The connection string, normally `localhost` or `localhost:21707`.
42+
2. The name of the database where the work will be done.
43+
3. A Lua module which contains the **task** function data.
44+
4. A Lua module which contains the **map** function data.
45+
5. A Lua module which contains the **partition** function data.
46+
6. A Lua module which contains the **reduce** function data.
47+
7. A Lua module which contains the **final** function data.
48+
49+
- `execute_worker.lua` runs the worker, which is configured by default to
50+
execute one map-reduce task and finish its operation. One task doesn't mean
51+
one job. A **map-reduce task** is performed as several individual **map/reduce
52+
jobs**. A worker waits until all the possible map or reduce jobs are completed
53+
to consider a task as finished. This script receives two arguments:
54+
55+
1. The connection string, as above.
56+
2. The name of the database where the work will be done, as above.
57+
58+
A simple word-count example is available in the repository. There are two
59+
shell-scripts: `execute_server_example.sh` and `execute_worker_example.sh`;
60+
which are ready to run the word-count example in only one machine, with one or
61+
more worker instances. The execution of the example looks like this:
62+
63+
**SERVER**
64+
```
65+
$ ./execute_example_server.sh > output
66+
# Preparing MAP
67+
# MAP execution
68+
100.0 %
69+
# Preparing REDUCE
70+
# MERGE AND PARTITIONING
71+
100.0 %
72+
# CREATING JOBS
73+
# STARTING REDUCE
74+
# REDUCE execution
75+
100.0 %
76+
# FINAL execution
77+
```
78+
79+
**WORKER**
80+
```
81+
$ ./execute_example_worker.sh
82+
# NEW TASK READY
83+
# EXECUTING MAP JOB _id: "1"
84+
# FINISHED
85+
# EXECUTING MAP JOB _id: "2"
86+
# FINISHED
87+
# EXECUTING MAP JOB _id: "3"
88+
# FINISHED
89+
# EXECUTING MAP JOB _id: "4"
90+
# FINISHED
91+
# EXECUTING REDUCE JOB _id: "121"
92+
# FINISHED
93+
# EXECUTING REDUCE JOB _id: "37"
94+
# FINISHED
95+
...
96+
```
97+
98+
Map-reduce task example: word-count
99+
-----------------------------------
100+
101+
The example is composed by one Lua module for each of the map-reduce functions,
102+
and are available at the directory `examples/WordCount/`. All the modules has
103+
the same structure, they return a Lua table with two fields:
104+
105+
- **init** function, which receives a table of arguments and allows to configure
106+
your module options, in case that you need any option.
107+
108+
- **func** function, which implements the necessary Lua code.
109+
110+
A map-reduce task is divided, at least, in the following modules:
111+
112+
- **taskfn.lua** is the script which defines how the data is divided in order to
113+
create **map jobs**. The **func** field is executed as a Lua *coroutine*, so,
114+
every map job will be created by calling `corotuine.yield(key,value)`.
115+
116+
```Lua
117+
-- arg is for configuration purposes, it is allowed in any of the scripts
118+
local init = function(arg)
119+
-- do whatever you need for initialization parametrized by arg table
120+
end
121+
return {
122+
init = init,
123+
func = function()
124+
coroutine.yield(1,"mapreduce/server.lua")
125+
coroutine.yield(2,"mapreduce/worker.lua")
126+
coroutine.yield(3,"mapreduce/test.lua")
127+
coroutine.yield(4,"mapreduce/utils.lua")
128+
end
129+
}
130+
```
131+
132+
- **mapfn.lua** is the script where the map function is implemented. The
133+
**func** field is executed as a standard Lua function, and receives tow
134+
arguments `(key,value)` generated by one of the yields at your `taskfn`
135+
script. Map results are produced by calling the global function
136+
`emit(key,value)`.
137+
138+
```Lua
139+
return {
140+
init = function() end,
141+
func = function(key,value)
142+
for line in io.lines(value) do
143+
for w in line:gmatch("[^%s]+") do
144+
emit(w,1)
145+
end
146+
end
147+
end
148+
}
149+
```
150+
151+
- **partitionfn.lua** is the script which describes how the map results are
152+
grouped and partitioned in order to create **reduce jobs**. The **func** field
153+
is a hash function which receives an emitted key and returns an integer
154+
number. Depending in your hash function, more or less reducers will be needed.
155+
156+
```Lua
157+
return {
158+
init = function() end,
159+
func = function(key)
160+
return key:byte(#key) -- last character (numeric byte)
161+
end
162+
}
163+
```
164+
165+
- **reducefn.lua** is the script which implements the reduce function. The
166+
**func** field is a function which receives a pair `(key,values)` where the
167+
`key` is one of the emitted keys, and the `values` is a Lua array (table with
168+
integer and sequential keys starting at 1) with all the available map values
169+
for the given key. The system could reuse the reduce function several times,
170+
so, it must be idempotent. The reduce results will be grouped following the
171+
partition function. For each possible partition, a GridFS file will be created
172+
in a collection called `dbname_fs` where dbname is the database name defined
173+
above.
174+
175+
```Lua
176+
return {
177+
init = function() end,
178+
func = function(key,values)
179+
local count=0
180+
for _,v in ipairs(values) do count = count + v end
181+
return count
182+
end
183+
}
184+
```
185+
186+
- **finalfn.lua** is the script which implements how to take the results
187+
produced by the system. The **func** field is a function which receives a
188+
Lua pairs iterator, and returns a boolean indicating if to destroy or not
189+
the GridFS collection data. If the returned value is `true`, the results
190+
will be removed. If the returned value is `false` or `nil`, the results
191+
will be available after the execution of your map-reduce task.
192+
193+
```Lua
194+
return {
195+
init = function() end,
196+
func = function(it)
197+
for key,value in it do
198+
print(value,key)
199+
end
200+
return true -- indicates to remove mongo gridfs result files
201+
end
202+
}
203+
```
204+
205+
Last notes
206+
----------
207+
208+
This software is in development. More documentation will be added to the
209+
wiki pages, while we have time to do that. Collaboration is open, and all your
210+
contributions will be welcome.

0 commit comments

Comments
 (0)