You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+15-11
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,13 @@
17
17
under the License.
18
18
-->
19
19
20
-
# datafusion-ray: DataFusion on Ray
20
+
# DataFusion on Ray
21
21
22
-
This is a research project to evaluate performing distributed SQL queries from Python, using
22
+
> This is a originally a research project donated from [ray-sql](https://github.com/datafusion-contrib/ray-sql) to evaluate performing distributed SQL queries from Python, using
23
23
[Ray](https://www.ray.io/) and [DataFusion](https://github.com/apache/arrow-datafusion).
24
24
25
+
DataFusion Ray is a distributed SQL query engine powered by the Rust implementation of [Apache Arrow](https://arrow.apache.org/), [Apache DataFusion](https://datafusion.apache.org/) and [Ray](https://www.ray.io/).
26
+
25
27
## Goals
26
28
27
29
- Demonstrate how easily new systems can be built on top of DataFusion. See the [design documentation](./docs/README.md)
@@ -31,7 +33,9 @@ This is a research project to evaluate performing distributed SQL queries from P
31
33
32
34
## Non Goals
33
35
34
-
- Build and support a production system.
36
+
- Re-build the cluster scheduling systems like what [Ballista](https://datafusion.apache.org/ballista/) did.
37
+
- Ballista is extremely complex and utilizing Ray feels like it abstracts some of that complexity away.
38
+
- Datafusion Ray is delegating cluster management to Ray.
0 commit comments