Skip to content

Commit f679fca

Browse files
committed
Merge pull request #4 from pakozm/devel
Devel
2 parents b4a0dd7 + 082b140 commit f679fca

21 files changed

+713
-316
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Francisco Zamora-Martinez (2014-)
1+
Lua-MapReduce, Copyright (c) 2014, Francisco Zamora-Martinez
22

33
GNU GENERAL PUBLIC LICENSE
44
Version 3, 29 June 2007

README.md

Lines changed: 3 additions & 187 deletions
Original file line numberDiff line numberDiff line change
@@ -26,194 +26,10 @@ add the active directory by writing in the terminal:
2626
$ export LUA_PATH='?.lua;?/init.lua'
2727
```
2828

29-
Usage
30-
-----
29+
Documentation
30+
-------------
3131

32-
Two Lua scripts have been prepared for fast running of the software.
33-
34-
- `execute_server.lua` runs the master server for your map-reduce operation.
35-
Only **one instance** of this script is needed. Note that this software
36-
receives the **map-reduce task** splitted into several Lua modules. These
37-
modules had to be visible in the `LUA_PATH` of the server and all the workers
38-
that you execute. This script receives 7 mandatory arguments:
39-
40-
1. The connection string, normally `localhost` or `localhost:21707`.
41-
2. The name of the database where the work will be done.
42-
3. A Lua module which contains the **task** function data.
43-
4. A Lua module which contains the **map** function data.
44-
5. A Lua module which contains the **partition** function data.
45-
6. A Lua module which contains the **reduce** function data.
46-
7. A Lua module which contains the **final** function data.
47-
48-
- `execute_worker.lua` runs the worker, which is configured by default to
49-
execute one map-reduce task and finish its operation. One task doesn't mean
50-
one job. A **map-reduce task** is performed as several individual **map/reduce
51-
jobs**. A worker waits until all the possible map or reduce jobs are completed
52-
to consider a task as finished. This script receives two arguments:
53-
54-
1. The connection string, as above.
55-
2. The name of the database where the work will be done, as above.
56-
57-
A simple word-count example is available in the repository. There are two
58-
shell-scripts: `execute_server_example.sh` and `execute_worker_example.sh`;
59-
which are ready to run the word-count example in only one machine, with one or
60-
more worker instances. The execution of the example looks like this:
61-
62-
**SERVER**
63-
```
64-
$ ./execute_example_server.sh > output
65-
# Preparing MAP
66-
# MAP execution
67-
100.0 %
68-
# Preparing REDUCE
69-
# MERGE AND PARTITIONING
70-
100.0 %
71-
# CREATING JOBS
72-
# STARTING REDUCE
73-
# REDUCE execution
74-
100.0 %
75-
# FINAL execution
76-
```
77-
78-
**WORKER**
79-
```
80-
$ ./execute_example_worker.sh
81-
# NEW TASK READY
82-
# EXECUTING MAP JOB _id: "1"
83-
# FINISHED
84-
# EXECUTING MAP JOB _id: "2"
85-
# FINISHED
86-
# EXECUTING MAP JOB _id: "3"
87-
# FINISHED
88-
# EXECUTING MAP JOB _id: "4"
89-
# FINISHED
90-
# EXECUTING REDUCE JOB _id: "121"
91-
# FINISHED
92-
# EXECUTING REDUCE JOB _id: "37"
93-
# FINISHED
94-
...
95-
```
96-
97-
Map-reduce task example: word-count
98-
-----------------------------------
99-
100-
The example is composed by one Lua module for each of the map-reduce functions,
101-
and are available at the directory `examples/WordCount/`. All the modules has
102-
the same structure, they return a Lua table with two fields:
103-
104-
- **init** function, which receives a table of arguments and allows to configure
105-
your module options, in case that you need any option.
106-
107-
- A function which implements the necessary Lua code for the operation. The name
108-
of the function is different for each operation.
109-
110-
A map-reduce task is divided, at least, in the following modules:
111-
112-
- **taskfn.lua** is the script which defines how the data is divided in order to
113-
create **map jobs**. The **func** field is executed as a Lua *coroutine*, so,
114-
every map job will be created by calling `corotuine.yield(key,value)`.
115-
116-
```Lua
117-
-- arg is for configuration purposes, it is allowed in any of the scripts
118-
local init = function(arg)
119-
-- do whatever you need for initialization parametrized by arg table
120-
end
121-
return {
122-
init = init,
123-
taskfn = function()
124-
coroutine.yield(1,"mapreduce/server.lua")
125-
coroutine.yield(2,"mapreduce/worker.lua")
126-
coroutine.yield(3,"mapreduce/test.lua")
127-
coroutine.yield(4,"mapreduce/utils.lua")
128-
end
129-
}
130-
```
131-
132-
- **mapfn.lua** is the script where the map function is implemented. The
133-
**func** field is executed as a standard Lua function, and receives three
134-
arguments `(key,value,emit)`. The first two are generated b
135-
one of the yields at your `taskfn`
136-
script. The third argument is a function. Map results
137-
are produced by calling the function
138-
`emit(key,value)`.
139-
140-
```Lua
141-
return {
142-
init = function() end,
143-
mapfn = function(key,value,emit)
144-
for line in io.lines(value) do
145-
for w in line:gmatch("[^%s]+") do
146-
emit(w,1)
147-
end
148-
end
149-
end
150-
}
151-
```
152-
153-
- **partitionfn.lua** is the script which describes how the map results are
154-
grouped and partitioned in order to create **reduce jobs**. The **func** field
155-
is a hash function which receives an emitted key and returns an integer
156-
number. Depending in your hash function, more or less reducers will be needed.
157-
158-
```Lua
159-
-- string hash function: http://isthe.com/chongo/tech/comp/fnv/
160-
local NUM_REDUCERS = 10
161-
local FNV_prime = 16777619
162-
local offset_basis = 2166136261
163-
local MAX = 2^32
164-
return {
165-
init = function() end,
166-
partitionfn = function(key)
167-
-- compute hash
168-
local h = offset_basis
169-
for i=1,#key do
170-
h = (h * FNV_prime) % MAX
171-
h = bit32.bxor(h, key:byte(i))
172-
end
173-
return h % NUM_REDUCERS
174-
end
175-
}
176-
```
177-
178-
- **reducefn.lua** is the script which implements the reduce function. The
179-
**func** field is a function which receives a pair `(key,values)` where the
180-
`key` is one of the emitted keys, and the `values` is a Lua array (table with
181-
integer and sequential keys starting at 1) with all the available map values
182-
for the given key. The system could reuse the reduce function several times,
183-
so, it must be idempotent. The reduce results will be grouped following the
184-
partition function. For each possible partition, a GridFS file will be created
185-
in a collection called `dbname_fs` where dbname is the database name defined
186-
above.
187-
188-
```Lua
189-
return {
190-
init = function() end,
191-
reducefn = function(key,values)
192-
local count=0
193-
for _,v in ipairs(values) do count = count + v end
194-
return count
195-
end
196-
}
197-
```
198-
199-
- **finalfn.lua** is the script which implements how to take the results
200-
produced by the system. The **func** field is a function which receives a
201-
Lua pairs iterator, and returns a boolean indicating if to destroy or not
202-
the GridFS collection data. If the returned value is `true`, the results
203-
will be removed. If the returned value is `false` or `nil`, the results
204-
will be available after the execution of your map-reduce task.
205-
206-
```Lua
207-
return {
208-
init = function() end,
209-
finalfn = function(it)
210-
for key,value in it do
211-
print(value,key)
212-
end
213-
return true -- indicates to remove mongo gridfs result files
214-
end
215-
}
216-
```
32+
Available at [wiki pages](https://github.com/pakozm/lua-mapreduce/wiki).
21733

21834
Performance notes
21935
-----------------

examples/WordCount/finalfn.lua

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ return {
22
init = function() end,
33
finalfn = function(pairs_iterator)
44
for key,value in pairs_iterator do
5-
print(value,key)
5+
print(value[1],key)
66
end
77
return true -- indicates to remove mongo gridfs result files
88
end

examples/WordCount/init.lua

Lines changed: 57 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -3,46 +3,62 @@ local NUM_REDUCERS = 10
33
local FNV_prime = 16777619
44
local offset_basis = 2166136261
55
local MAX = 2^32
6-
return {
7-
-- arg is for configuration purposes, it will be executed with init_args given
8-
-- to the server
9-
init = function(arg) end,
10-
11-
taskfn = function()
12-
coroutine.yield(1,"mapreduce/server.lua")
13-
coroutine.yield(2,"mapreduce/worker.lua")
14-
coroutine.yield(3,"mapreduce/test.lua")
15-
coroutine.yield(4,"mapreduce/utils.lua")
16-
end,
17-
18-
mapfn = function(key,value,emit)
19-
for line in io.lines(value) do
20-
for w in line:gmatch("[^%s]+") do
21-
emit(w,1)
22-
end
23-
end
24-
end,
25-
26-
partitionfn = function(key)
27-
-- compute hash
28-
local h = offset_basis
29-
for i=1,#key do
30-
h = (h * FNV_prime) % MAX
31-
h = bit32.bxor(h, key:byte(i))
32-
end
33-
return h % NUM_REDUCERS
34-
end,
35-
36-
reducefn = function(key,values)
37-
local count=0
38-
for _,v in ipairs(values) do count = count + v end
39-
return count
40-
end,
41-
42-
finalfn = function(pairs_iterator)
43-
for key,value in pairs_iterator do
44-
print(value,key)
6+
7+
-- arg is for configuration purposes, it will be executed with init_args given
8+
-- to the server
9+
local init = function(arg) end
10+
11+
local taskfn = function(emit)
12+
emit(1,"mapreduce/server.lua")
13+
emit(2,"mapreduce/worker.lua")
14+
emit(3,"mapreduce/test.lua")
15+
emit(4,"mapreduce/utils.lua")
16+
end
17+
18+
local mapfn = function(key,value,emit)
19+
for line in io.lines(value) do
20+
for w in line:gmatch("[^%s]+") do
21+
emit(w,1)
4522
end
46-
return true -- indicates to remove mongo gridfs result files
47-
end,
23+
end
24+
end
25+
26+
local partitionfn = function(key)
27+
-- compute hash
28+
local h = offset_basis
29+
for i=1,#key do
30+
h = (h * FNV_prime) % MAX
31+
h = bit32.bxor(h, key:byte(i))
32+
end
33+
return h % NUM_REDUCERS
34+
end
35+
36+
local reducefn = function(key,values,emit)
37+
local count=0
38+
for _,v in ipairs(values) do count = count + v end
39+
emit(count)
40+
end
41+
42+
local combinerfn = reducefn
43+
44+
local finalfn = function(pairs_iterator)
45+
for key,value in pairs_iterator do
46+
print(value[1],key)
47+
end
48+
return true -- indicates to remove mongo gridfs result files
49+
end
50+
51+
return {
52+
init = init,
53+
taskfn = taskfn,
54+
mapfn = mapfn,
55+
partitionfn = partitionfn,
56+
reducefn = reducefn,
57+
combinerfn = combinerfn,
58+
finalfn = finalfn,
59+
-- This three properties are true for this reduce function.
60+
-- Combiners always must to fulfill this properties.
61+
associative_reducer = true,
62+
commutative_reducer = true,
63+
idempotent_reducer = true,
4864
}

examples/WordCount/reducefn.lua

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,15 @@
1+
local reducefn = function(key,values,emit)
2+
local count=0
3+
for _,v in ipairs(values) do count = count + v end
4+
emit(count)
5+
end
16
return {
27
init = function() end,
3-
reducefn = function(key,values)
4-
local count=0
5-
for _,v in ipairs(values) do count = count + v end
6-
return count
7-
end
8+
reducefn = reducefn,
9+
combinerfn = reducefn,
10+
-- This three properties are true for this reduce function.
11+
-- Combiners always must to fulfill this properties.
12+
associative_reducer = true,
13+
commutative_reducer = true,
14+
idempotent_reducer = true,
815
}

examples/WordCount/reducefn2.lua

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
local reducefn = function(key,values,emit)
2+
local count=0
3+
for _,v in ipairs(values) do count = count + v end
4+
emit(count)
5+
end
6+
return {
7+
init = function() end,
8+
reducefn = reducefn,
9+
combinerfn = reducefn,
10+
}

examples/WordCount/taskfn.lua

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@ local init = function(arg)
44
end
55
return {
66
init = init,
7-
taskfn = function()
8-
coroutine.yield(1,"mapreduce/server.lua")
9-
coroutine.yield(2,"mapreduce/worker.lua")
10-
coroutine.yield(3,"mapreduce/test.lua")
11-
coroutine.yield(4,"mapreduce/utils.lua")
7+
taskfn = function(emit)
8+
emit(1,"mapreduce/server.lua")
9+
emit(2,"mapreduce/worker.lua")
10+
emit(3,"mapreduce/test.lua")
11+
emit(4,"mapreduce/utils.lua")
1212
end
1313
}

examples/WordCountBig/taskfn.lua

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@ return {
22
-- init is for configuration purposes, it is allowed in any of the scripts
33
init = function(arg)
44
end,
5-
taskfn = function()
5+
taskfn = function(emit)
66
local f = io.popen("ls /home/experimentos/CORPORA/EUROPARL/en-splits/*","r")
77
local i=0
88
for filename in f:lines() do
99
i=i+1
10-
coroutine.yield(i,filename)
10+
emit(i,filename)
1111
end
1212
f:close()
1313
end

execute_example_server.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ lua execute_server.lua localhost wordcount \
44
examples.WordCount.mapfn \
55
examples.WordCount.partitionfn \
66
examples.WordCount.reducefn \
7-
examples.WordCount.finalfn
7+
examples.WordCount.finalfn \
8+
examples.WordCount.reducefn $@

0 commit comments

Comments
 (0)