@@ -26,194 +26,10 @@ add the active directory by writing in the terminal:
26
26
$ export LUA_PATH='?.lua;?/init.lua'
27
27
```
28
28
29
- Usage
30
- -----
29
+ Documentation
30
+ -------------
31
31
32
- Two Lua scripts have been prepared for fast running of the software.
33
-
34
- - ` execute_server.lua ` runs the master server for your map-reduce operation.
35
- Only ** one instance** of this script is needed. Note that this software
36
- receives the ** map-reduce task** splitted into several Lua modules. These
37
- modules had to be visible in the ` LUA_PATH ` of the server and all the workers
38
- that you execute. This script receives 7 mandatory arguments:
39
-
40
- 1 . The connection string, normally ` localhost ` or ` localhost:21707 ` .
41
- 2 . The name of the database where the work will be done.
42
- 3 . A Lua module which contains the ** task** function data.
43
- 4 . A Lua module which contains the ** map** function data.
44
- 5 . A Lua module which contains the ** partition** function data.
45
- 6 . A Lua module which contains the ** reduce** function data.
46
- 7 . A Lua module which contains the ** final** function data.
47
-
48
- - ` execute_worker.lua ` runs the worker, which is configured by default to
49
- execute one map-reduce task and finish its operation. One task doesn't mean
50
- one job. A ** map-reduce task** is performed as several individual ** map/reduce
51
- jobs** . A worker waits until all the possible map or reduce jobs are completed
52
- to consider a task as finished. This script receives two arguments:
53
-
54
- 1 . The connection string, as above.
55
- 2 . The name of the database where the work will be done, as above.
56
-
57
- A simple word-count example is available in the repository. There are two
58
- shell-scripts: ` execute_server_example.sh ` and ` execute_worker_example.sh ` ;
59
- which are ready to run the word-count example in only one machine, with one or
60
- more worker instances. The execution of the example looks like this:
61
-
62
- ** SERVER**
63
- ```
64
- $ ./execute_example_server.sh > output
65
- # Preparing MAP
66
- # MAP execution
67
- 100.0 %
68
- # Preparing REDUCE
69
- # MERGE AND PARTITIONING
70
- 100.0 %
71
- # CREATING JOBS
72
- # STARTING REDUCE
73
- # REDUCE execution
74
- 100.0 %
75
- # FINAL execution
76
- ```
77
-
78
- ** WORKER**
79
- ```
80
- $ ./execute_example_worker.sh
81
- # NEW TASK READY
82
- # EXECUTING MAP JOB _id: "1"
83
- # FINISHED
84
- # EXECUTING MAP JOB _id: "2"
85
- # FINISHED
86
- # EXECUTING MAP JOB _id: "3"
87
- # FINISHED
88
- # EXECUTING MAP JOB _id: "4"
89
- # FINISHED
90
- # EXECUTING REDUCE JOB _id: "121"
91
- # FINISHED
92
- # EXECUTING REDUCE JOB _id: "37"
93
- # FINISHED
94
- ...
95
- ```
96
-
97
- Map-reduce task example: word-count
98
- -----------------------------------
99
-
100
- The example is composed by one Lua module for each of the map-reduce functions,
101
- and are available at the directory ` examples/WordCount/ ` . All the modules has
102
- the same structure, they return a Lua table with two fields:
103
-
104
- - ** init** function, which receives a table of arguments and allows to configure
105
- your module options, in case that you need any option.
106
-
107
- - A function which implements the necessary Lua code for the operation. The name
108
- of the function is different for each operation.
109
-
110
- A map-reduce task is divided, at least, in the following modules:
111
-
112
- - ** taskfn.lua** is the script which defines how the data is divided in order to
113
- create ** map jobs** . The ** func** field is executed as a Lua * coroutine* , so,
114
- every map job will be created by calling ` corotuine.yield(key,value) ` .
115
-
116
- ``` Lua
117
- -- arg is for configuration purposes, it is allowed in any of the scripts
118
- local init = function (arg )
119
- -- do whatever you need for initialization parametrized by arg table
120
- end
121
- return {
122
- init = init ,
123
- taskfn = function ()
124
- coroutine.yield (1 ," mapreduce/server.lua" )
125
- coroutine.yield (2 ," mapreduce/worker.lua" )
126
- coroutine.yield (3 ," mapreduce/test.lua" )
127
- coroutine.yield (4 ," mapreduce/utils.lua" )
128
- end
129
- }
130
- ```
131
-
132
- - ** mapfn.lua** is the script where the map function is implemented. The
133
- ** func** field is executed as a standard Lua function, and receives three
134
- arguments ` (key,value,emit) ` . The first two are generated b
135
- one of the yields at your ` taskfn `
136
- script. The third argument is a function. Map results
137
- are produced by calling the function
138
- ` emit(key,value) ` .
139
-
140
- ``` Lua
141
- return {
142
- init = function () end ,
143
- mapfn = function (key ,value ,emit )
144
- for line in io.lines (value ) do
145
- for w in line :gmatch (" [^%s]+" ) do
146
- emit (w ,1 )
147
- end
148
- end
149
- end
150
- }
151
- ```
152
-
153
- - ** partitionfn.lua** is the script which describes how the map results are
154
- grouped and partitioned in order to create ** reduce jobs** . The ** func** field
155
- is a hash function which receives an emitted key and returns an integer
156
- number. Depending in your hash function, more or less reducers will be needed.
157
-
158
- ``` Lua
159
- -- string hash function: http://isthe.com/chongo/tech/comp/fnv/
160
- local NUM_REDUCERS = 10
161
- local FNV_prime = 16777619
162
- local offset_basis = 2166136261
163
- local MAX = 2 ^ 32
164
- return {
165
- init = function () end ,
166
- partitionfn = function (key )
167
- -- compute hash
168
- local h = offset_basis
169
- for i = 1 ,# key do
170
- h = (h * FNV_prime ) % MAX
171
- h = bit32.bxor (h , key :byte (i ))
172
- end
173
- return h % NUM_REDUCERS
174
- end
175
- }
176
- ```
177
-
178
- - ** reducefn.lua** is the script which implements the reduce function. The
179
- ** func** field is a function which receives a pair ` (key,values) ` where the
180
- ` key ` is one of the emitted keys, and the ` values ` is a Lua array (table with
181
- integer and sequential keys starting at 1) with all the available map values
182
- for the given key. The system could reuse the reduce function several times,
183
- so, it must be idempotent. The reduce results will be grouped following the
184
- partition function. For each possible partition, a GridFS file will be created
185
- in a collection called ` dbname_fs ` where dbname is the database name defined
186
- above.
187
-
188
- ``` Lua
189
- return {
190
- init = function () end ,
191
- reducefn = function (key ,values )
192
- local count = 0
193
- for _ ,v in ipairs (values ) do count = count + v end
194
- return count
195
- end
196
- }
197
- ```
198
-
199
- - ** finalfn.lua** is the script which implements how to take the results
200
- produced by the system. The ** func** field is a function which receives a
201
- Lua pairs iterator, and returns a boolean indicating if to destroy or not
202
- the GridFS collection data. If the returned value is ` true ` , the results
203
- will be removed. If the returned value is ` false ` or ` nil ` , the results
204
- will be available after the execution of your map-reduce task.
205
-
206
- ``` Lua
207
- return {
208
- init = function () end ,
209
- finalfn = function (it )
210
- for key ,value in it do
211
- print (value ,key )
212
- end
213
- return true -- indicates to remove mongo gridfs result files
214
- end
215
- }
216
- ```
32
+ Available at [ wiki pages] ( https://github.com/pakozm/lua-mapreduce/wiki ) .
217
33
218
34
Performance notes
219
35
-----------------
0 commit comments