Skip to content

Commit 38de7d4

Browse files
authored
Refine introduction and architecture descriptions (#101)
Revised descriptions for clarity and consistency, emphasizing the unifying aspects of Apache Wayang and its cross-platform capabilities.
1 parent 17c6c7d commit 38de7d4

1 file changed

Lines changed: 10 additions & 7 deletions

File tree

docs/introduction/about.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,17 @@ sidebar_position: 1
2121
limitations under the License.
2222
2323
-->
24-
#### *A unified data processing framework that seamlessly integrates and orchestrates multiple data platforms to deliver unparalleled performance and flexibility.*
24+
#### *A unifying data processing framework that seamlessly integrates and orchestrates multiple data processing systems to deliver performance and flexibility.*
2525

26-
Apache Wayang's three-layer architecture provides a strategic *abstraction* between user applications and underlying data processing platforms, ensuring seamless integration and optimization. The application layer encapsulates application-specific logic, while the core layer acts as an intermediary, translating application logic into a standardized intermediate representation (WayangPlan). This standardized representation is then passed to the platform layer, where it is optimized for execution across a diverse range of data platforms, including but not limited to any database, Spark, Flink, and ML systems. This optimization process ensures that the execution plan (ExecutionPlan) is tailored to the specific strengths and capabilities of each data engine, maximizing performance and efficiency.
26+
Apache Wayang's three-layer architecture provides a strategic *abstraction* between user applications and underlying data processing platforms, ensuring seamless integration among heterogeneous systems. The application layer encapsulates application-specific logic, while the core layer acts as an intermediary, translating application logic into an intermediate representation (Wayang plan). The Wayang plan is then transformed into an execution plan in the platform layer, where each operator is assigned to be run on a specific platform selected from a diverse pool of execution engines, including but not limited to any database, Apache Spark, Apache Flink, and ML systems. This abstraction allows for cross-platform optimization and execution.
2727

28-
Designed with flexibility as a priority, Apache Wayang enables easy *extensibility* to accommodate new operators and data platforms.
28+
One of Wayang’s key innovations is its *cross-platform optimizer*, which automates data system selection and spares users from making complex platform choices.
29+
This optimization process ensures that the resulted execution plan is tailored to the specific strengths and capabilities of each data engine, maximizing performance and efficiency.
30+
31+
Apache Wayang's core strength lies in its *cross-platform task execution*, enabling developers to seamlessly combine the strengths of various processing engines, such as Spark, Flink, and Tensorflow, *in one pipeline*.
32+
Designed with flexibility as a priority, Apache Wayang enables easy *extensibility* to accommodate new operators and data systems.
33+
The platform's extensibility and ease of use makes it a compelling choice for data engineers and developers seeking a unifying and versatile data processing solution.
34+
<br/>
2935

3036
### Architecture and Software stack
3137
Apache Wayang's unique architecture, unlike traditional DBMSs, decouples the physical planning and execution layers, empowering developers to express their data processing logic in a platform-agnostic fashion. This separation of concerns allows developers to focus on the algorithmic aspects of their applications without being constrained by the intricacies of specific processing platforms.
@@ -34,11 +40,8 @@ Apache Wayang's unique architecture, unlike traditional DBMSs, decouples the phy
3440
<img width="75%" alt="wayang stack" src="/img/architecture/wayang-stack.png" />
3541
<br/><br/>
3642

37-
At the bottom layers of the software stack, there are the different data storage mediums and the supported data processing platforms. On top of these, Wayang’s core consists of the following main components: the optimizer, the executor, the monitor, and platform-specific drivers. Wayang currently supports two main APIs: the Java one and the Scala one. A Python API is currently under development. Besides using any of the supported languages, users can directly input SQL queries via the SQL library, which transforms them into a Wayang plan. Wayang also comes with an ML library for running ML tasks. Users can directly utilize the provided algorithms or can implement their own algorithm using a simple ML abstraction. To enable support for more programming languages in an efficient way, Wayang will soon come with a Polyglot library.
38-
39-
<br/>
43+
At the bottom layers of the software stack, there are the different data storage mediums and the supported data processing platforms. On top of these, Wayang’s core consists of the following main components: the optimizer, the executor, the monitor, and platform-specific drivers. Wayang currently supports two main APIs: the Java one and the Scala one. A Python API is also supported with limited operator coverage for the moment. Besides using any of the supported languages, users can directly input SQL queries via the SQL library, which transforms them into a Wayang plan. Wayang also comes with an ML library for running ML tasks. Users can directly utilize the provided algorithms or can implement their own algorithm using a simple ML abstraction. To enable support for more programming languages in an efficient way, Wayang will soon come with a Polyglot library.
4044

41-
Apache Wayang's core strength lies in its cross-platform task execution, enabling developers to seamlessly leverage the strengths of various processing engines, such as Hadoop, Spark, and Flink, without sacrificing performance or flexibility. The platform's ease of use further enhances its appeal, making it a compelling choice for data engineers and developers seeking a unified and versatile data processing solution.
4245
<br/>
4346
Below you can see on the left, a Wayang plan representing the stochastic gradient descent algorithm, which used in most deep learning tasks. On the right, you can see how the optimizer decided to execute it. Orange nodes are the operators that ran on Spark and green the operators executed as a single Java process.
4447
<br/>

0 commit comments

Comments
 (0)