Skip to content

Commit d75a572

Browse files
authored
Merge pull request #372 from zkaoudi/main
ML4all module
2 parents f68bf88 + 5cee47f commit d75a572

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2359
-8
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,7 @@ You can see examples on how to start using Wayang [here](guides/wayang-examples.
184184
## Contributing
185185
Before submitting a PR, please take a look on how to contribute with Apache Wayang contributing guidelines [here](CONTRIBUTING.md).
186186

187+
There is also a guide on how to compile your code [here](guides/develop-in-Wayang.md).
187188
## Authors
188189
The list of [contributors](https://github.com/apache/incubator-wayang/graphs/contributors).
189190

guides/develop-in-Wayang.md

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
<!--
2+
3+
Licensed to the Apache Software Foundation (ASF) under one or more
4+
contributor license agreements. See the NOTICE file distributed with
5+
this work for additional information regarding copyright ownership.
6+
The ASF licenses this file to You under the Apache License, Version 2.0
7+
(the "License"); you may not use this file except in compliance with
8+
the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing, software
13+
distributed under the License is distributed on an "AS IS" BASIS,
14+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
See the License for the specific language governing permissions and
16+
limitations under the License.
17+
18+
-->
19+
This tutorial shows users how to compile their code within Wayang using **maven**.
20+
21+
## Compile the module you modified
22+
Within the root directory of Wayang, compile only the module you modified for faster compilation:
23+
```shell
24+
./mvnw clean install -DskipTests -pl <modified_module>
25+
```
26+
Important: before making a Pull Request make sure all modules compile and all tests are passing:
27+
```shell
28+
./mvnw clean install
29+
```
30+
##Package the project
31+
```shell
32+
./mvnw clean package -pl :wayang-assembly -Pdistribution
33+
```
34+
35+
## Execute your code
36+
Before executing your code, make sure the required environment variables are set correctly (see [tutorial.md](tutorial.md))
37+
```shell
38+
cd wayang-assembly/target/
39+
tar -xvf apache-wayang-assembly-0.7.1-SNAPSHOT-incubating-dist.tar.gz
40+
cd wayang-0.7.1-SNAPSHOT
41+
./bin/wayang-submit org.apache.wayang.<main_class> <parameters>
42+
```

guides/develop-with-Wayang.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -23,27 +23,27 @@ This tutorial shows users how to import Wayang in their Java project using the m
2323
<dependency>
2424
<groupId>org.apache.wayang</groupId>
2525
<artifactId>wayang-core</artifactId>
26-
<version>0.6.0</version>
26+
<version>0.7.1</version>
2727
</dependency>
2828
<dependency>
2929
<groupId>org.apache.wayang</groupId>
3030
<artifactId>wayang-basic</artifactId>
31-
<version>0.6.0</version>
31+
<version>0.7.1</version>
3232
</dependency>
3333
<dependency>
3434
<groupId>org.apache.wayang</groupId>
3535
<artifactId>wayang-java</artifactId>
36-
<version>0.6.0</version>
36+
<version>0.7.1</version>
3737
</dependency>
3838
<dependency>
3939
<groupId>org.apache.wayang</groupId>
4040
<artifactId>wayang-spark_2.12</artifactId>
41-
<version>0.6.0</version>
41+
<version>0.7.1</version>
4242
</dependency>
4343
<dependency>
4444
<groupId>org.apache.wayang</groupId>
4545
<artifactId>wayang-api-scala-java_2.12</artifactId>
46-
<version>0.6.0</version>
46+
<version>0.7.1</version>
4747
</dependency>
4848
```
4949

pom.xml

+2-3
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@
391391

392392
- Prevents thrid-party snapshot dependencies in projects
393393
-->
394-
<!-- TODO: This is actually automatically done by the maven-release-plugin:prepare goal Therefore if could be removed -->
394+
<!-- TODO: This is actually automatically done by the maven-release-plugin:prepare goal Therefore it could be removed -->
395395
<profile>
396396
<id>pre-release</id>
397397
<build>
@@ -1208,8 +1208,6 @@
12081208
</execution>
12091209
</executions>
12101210
<configuration>
1211-
<!-- Right now this would fail the build as not all files have Apache headers -->
1212-
<!-- TODO: Enable asap -->
12131211
<useMavenDefaultExcludes>true</useMavenDefaultExcludes>
12141212
<!--
12151213
Make rat output the files with missing licensed directly into the
@@ -1524,6 +1522,7 @@
15241522
<module>wayang-resources</module>
15251523
<module>wayang-benchmark</module>
15261524
<module>wayang-assembly</module>
1525+
<module>wayang-ml4all</module>
15271526
<!-- <module>wayang-docs</module> -->
15281527
</modules>
15291528
</project>

wayang-assembly/pom.xml

+5
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,11 @@
8484
<artifactId>wayang-postgres</artifactId>
8585
<version>0.7.1-SNAPSHOT</version>
8686
</dependency>
87+
<dependency>
88+
<groupId>org.apache.wayang</groupId>
89+
<artifactId>wayang-ml4all</artifactId>
90+
<version>0.7.1-SNAPSHOT</version>
91+
</dependency>
8792
</dependencies>
8893

8994
<build>

wayang-ml4all/README.md

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# ML4all: scalable ML system for everyone
2+
3+
4+
ML4all is a system that frees users from the burden of machine learning algorithm selection and low-level implementation details.
5+
It uses a new abstraction that is capable of solving most ML tasks and provides a cost-based optimizer on top of the proposed abstraction for choosing the best gradient descent algorithm in a given setting.
6+
Our results show that ML4all is more than two orders of magnitude faster than state-of-the-art systems and can process large datasets that were not possible before.
7+
8+
More details can be found in our dedicated [SIGMOD publication](https://dl.acm.org/citation.cfm?id=3064042) and
9+
in Wayang's core [system paper](https://sigmodrecord.org/publications/sigmodRecord/2309/pdfs/05_Systems_Beedkar.pdf).
10+
11+
## Abstraction
12+
ML4all abstracts most ML algorithms with seven operators:
13+
14+
- (1) `Transform` receives a data point to transform
15+
(e.g., normalize it) and outputs a new data point.
16+
17+
- (2) `Stage` initializes all the required global param-
18+
eters (e.g., centroids for the k-means algorithm).
19+
20+
- (3) `Compute` performs user-defined computations
21+
on the input data point and returns a new data
22+
point. For example, it can compute the nearest cen-
23+
troid for each input data point.
24+
25+
- (4) `Update` updates the global parameters based on
26+
a user-defined formula. For example, it can update
27+
the new centroids based on the output computed by
28+
the Compute operator.
29+
30+
- (5) `Sample` takes as input the size of the desired
31+
sample and the data points to sample from and re-
32+
turns a reduced set of sampled data points.
33+
34+
- (6) `Converge` specifies a function that outputs
35+
a convergence dataset required for determining
36+
whether the iterations should continue or stop.
37+
38+
- (7) `Loop` specifies the stopping condition on the
39+
convergence dataset.
40+
41+
Similar to MapReduce, where users need to implement a map and reduce function, users of ML4all wishing to develop their own algorithm should implement the above interfaces.
42+
The interfaces can be found in `org.apache.wayang.ml4all.abstraction.api`.
43+
44+
Examples for KMeans clustering and stochastic gradient descent can be found in `org.apache.wayang.ml4all.algorithms`.
45+
46+
## Example runs
47+
- Kmeans:
48+
49+
```shell
50+
./bin/wayang-submit org.apache.wayang.ml4all.examples.RunKMeans java,spark file:///Users/zoi/Work/WAYANG/forked/incubator-wayang/wayang-ml4all/src/main/resources/input/USCensus1990-sample.input 3 68 0 1
51+
```
52+
53+
- SGD:
54+
```shell
55+
./bin/wayang-submit org.apache.wayang.ml4all.examples.RunSGD spark file:///Users/zoi/Work/WAYANG/forked/incubator-wayang/wayang-ml4all/src/main/resources/input/adult.zeros.input 100827 123 10 0.001
56+
```

wayang-ml4all/pom.xml

+94
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Licensed to the Apache Software Foundation (ASF) under one
4+
or more contributor license agreements. See the NOTICE file
5+
distributed with this work for additional information
6+
regarding copyright ownership. The ASF licenses this file
7+
to you under the Apache License, Version 2.0 (the
8+
"License"); you may not use this file except in compliance
9+
with the License. You may obtain a copy of the License at
10+
11+
http://www.apache.org/licenses/LICENSE-2.0
12+
13+
Unless required by applicable law or agreed to in writing,
14+
software distributed under the License is distributed on an
15+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
KIND, either express or implied. See the License for the
17+
specific language governing permissions and limitations
18+
under the License.
19+
-->
20+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
21+
<modelVersion>4.0.0</modelVersion>
22+
23+
<parent>
24+
<groupId>org.apache.wayang</groupId>
25+
<artifactId>wayang</artifactId>
26+
<version>0.7.1-SNAPSHOT</version>
27+
</parent>
28+
29+
<artifactId>wayang-ml4all</artifactId>
30+
<version>0.7.1-SNAPSHOT</version>
31+
32+
<name>Wayang ML4all</name>
33+
<description>
34+
This Wayang module contains an ML abstraction for easily implementing ML algorithms.
35+
</description>
36+
37+
<properties>
38+
<java-module-name>org.apache.wayang.ml4all</java-module-name>
39+
</properties>
40+
41+
<dependencies>
42+
<dependency>
43+
<groupId>org.apache.wayang</groupId>
44+
<artifactId>wayang-core</artifactId>
45+
<version>0.7.1-SNAPSHOT</version>
46+
</dependency>
47+
<dependency>
48+
<groupId>org.apache.wayang</groupId>
49+
<artifactId>wayang-basic</artifactId>
50+
<version>0.7.1-SNAPSHOT</version>
51+
</dependency>
52+
<dependency>
53+
<groupId>org.apache.wayang</groupId>
54+
<artifactId>wayang-java</artifactId>
55+
<version>0.7.1-SNAPSHOT</version>
56+
</dependency>
57+
<dependency>
58+
<groupId>org.apache.wayang</groupId>
59+
<artifactId>wayang-spark_2.12</artifactId>
60+
<version>0.7.1-SNAPSHOT</version>
61+
</dependency>
62+
<dependency>
63+
<groupId>org.apache.wayang</groupId>
64+
<artifactId>wayang-api-scala-java_2.12</artifactId>
65+
<version>0.7.1-SNAPSHOT</version>
66+
</dependency>
67+
<dependency>
68+
<groupId>org.apache.hadoop</groupId>
69+
<artifactId>hadoop-hdfs</artifactId>
70+
<version>3.3.6</version>
71+
</dependency>
72+
<dependency>
73+
<groupId>org.apache.hadoop</groupId>
74+
<artifactId>hadoop-client</artifactId>
75+
<version>3.0.3</version>
76+
</dependency>
77+
<dependency>
78+
<groupId>org.apache.spark</groupId>
79+
<artifactId>spark-core_2.12</artifactId>
80+
<version>3.1.2</version>
81+
</dependency>
82+
<dependency>
83+
<groupId>log4j</groupId>
84+
<artifactId>log4j</artifactId>
85+
<version>1.2.17</version>
86+
</dependency>
87+
<dependency>
88+
<groupId>org.scala-lang</groupId>
89+
<artifactId>scala-library</artifactId>
90+
<version>2.12.12</version>
91+
</dependency>
92+
</dependencies>
93+
94+
</project>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
package org.apache.wayang.ml4all.abstraction.api;
20+
21+
import org.apache.wayang.ml4all.abstraction.plan.ML4allGlobalVars;
22+
23+
/**
24+
* Created by zoi on 22/1/15.
25+
*/
26+
public abstract class Compute<R, V> extends LogicalOperator {
27+
28+
/**
29+
* Performs a computation at the data unit granularity
30+
*
31+
* @param input a data unit
32+
* @param context
33+
*/
34+
public abstract R process(V input, ML4allGlobalVars context);
35+
36+
/**
37+
* Aggregates the output of the process() method to use in a group by
38+
*/
39+
public abstract R aggregate(R input1, R input2);
40+
41+
42+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
package org.apache.wayang.ml4all.abstraction.api;
20+
21+
import org.apache.wayang.ml4all.abstraction.plan.ML4allGlobalVars;
22+
23+
public abstract class LocalStage extends LogicalOperator {
24+
25+
/* initialize variables and add them in the context */
26+
public abstract void staging (ML4allGlobalVars context);
27+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
package org.apache.wayang.ml4all.abstraction.api;
20+
21+
import java.io.Serializable;
22+
23+
/**
24+
* An ML4all logical operator
25+
*/
26+
27+
public abstract class LogicalOperator implements Serializable {
28+
public void initialise() { }
29+
public void finalise() { }
30+
}

0 commit comments

Comments
 (0)