add GlassFlow #2615

Boburmirzo · 2024-09-20T16:02:31Z

What is this Python project?

GlassFlow is a serverless, Python-centric real-time data transformation solution for end-to-end data pipelines. If you use GlassFlow, you do not need Apache Kafka and Flink. Visit the docs page to learn more: https://docs.glassflow.dev/get-started/introduction

Describe features.

You can:

Use GlassFlow out-of-the-box with any existing Python library.
Start GlassFlow without a complex initial setup such as creating clusters.
Skip the headache of managing partitions, shards, and workers' setup.
Define your pipeline as code using GlassFlow CLI.
Implement your transformation function using GlassFlow Python SDK
Run your Python code locally for easy development and debugging.

GlassFlow does:

Provides a pure Python and zero infrastructure environment.
Keeps your original data where it is.
Connects live data sources.
Ingests real-time data continuously.
Does real-time data transformation.
Simulates your production workloads.
Deploys your pipeline to production within minutes.
Delivers auto-scalable serverless event streaming infrastructure.

What's the difference between this Python project and similar ones?

Most real-time data processing tools including Kafka are Java-based, while in recent days Python has been the go-to language for data science and machine learning, especially with the AI hype. Because Python has a rich set of libraries for data manipulation and analysis, such as Pandas. To bridge this gap, nowadays you can find a set of tools and technologies available for real-time data processing in Python such as wrapper Python APIs/libraries for (JVM). However, In all Kafka wrappers, you can not simulate easily a production environment without a complex initial setup like creating computing clusters and managing partitions, shards, and workers' setups.

They need to implement a custom transformation user-defined function (UDF) to convert lets say most famous library Pandas transformation to Java syntax. This translation time can significantly impact the throughput and responsiveness of real-time applications.

Enumerate comparisons.

Getting a similar PyFlink based pipeline in production takes 6-12 months and involves several tools to use. GlassFlow can get your data pipeline up and running in just 15 minutes with single tool.

--

Anyone who agrees with this pull request could submit an Approve review to it.

Boburmirzo · 2024-09-20T16:05:28Z

@MatteoGuadrini @Wisma-55 @PythonChicken123 Could you help me to review and approve this PR, please? Thanks!

Boburmirzo · 2024-09-20T16:15:36Z

@Wisma-55 Thanks! Do you know who can merge the PR here?:)

add GlassFlow

20cef27

Wisma-55 approved these changes Sep 20, 2024

View reviewed changes

Clydeeller72 approved these changes Sep 25, 2024

View reviewed changes

Al838a3 approved these changes Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add GlassFlow #2615

add GlassFlow #2615

Boburmirzo commented Sep 20, 2024 •

edited

Loading

Boburmirzo commented Sep 20, 2024 •

edited

Loading

Boburmirzo commented Sep 20, 2024

add GlassFlow #2615

Are you sure you want to change the base?

add GlassFlow #2615

Conversation

Boburmirzo commented Sep 20, 2024 • edited Loading

What is this Python project?

What's the difference between this Python project and similar ones?

Boburmirzo commented Sep 20, 2024 • edited Loading

Boburmirzo commented Sep 20, 2024

Boburmirzo commented Sep 20, 2024 •

edited

Loading

Boburmirzo commented Sep 20, 2024 •

edited

Loading