Prototype expansion of SQL transforms for single-node execution

One of the main targets for the Ray Beam Runner is to support SQL (and streaming SQL).

Beam's SQL support is implemented in Java. There are two parts for the execution of SQL transforms in Beam:

- Expansion: The way Beam implements expansion of multi-language transforms is by implementing an `ExpansionService` interface ([sample of the GRPC implementation](https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L461-L579) - this seems way too complicated to be honest)

My idea:
- Implement a class "RayJavaExpansionService" - that receives the expansion request  that can be a relatively simple thing. It must contain:
    -  Schema of the Input PCollection ([what are schemas](https://beam.apache.org/documentation/programming-guide/#schemas))
    - Identifier of the transform to apply (these ideantifiers are provided by [SchemaTransformProvider](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java) implementations (see [a few examples](https://github.com/apache/beam/search?q=SchemaTransformProvider)) 
       - **Note**: I will implement a Sql one: `SqlSchemaTransformProvider` with id `"beam:schematransform:org.apache.beam:sql:v1"` this week.
   - Parameters for the transform (in this case, just the SQL statement)

The `RayJavaExpansionService` should then return the schema of the resulting PCollection, as well as the expanded graph of operations in protobuf format ([the proto format](https://github.com/ray-project/ray_beam_runner/blob/master/ray_beam_runner/portability/ray_fn_runner.py#L112-L114)).

- Java dependencies: 
    -  "org.apache.beam:beam-sdks-java-core"
    -  "org.apache.beam:beam-sdks-java-extensions-sql"


The expansion is not enough to execute SQL, but it's the first step. The next step is to recognize Java Stages, and execute them in a Java process rather than a Python process (basically, a Java [implementation of this code](https://github.com/ray-project/ray_beam_runner/blob/master/ray_beam_runner/portability/execution.py#L622-L634), where we return some kind of `JavaWorkerHandler` 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype expansion of SQL transforms for single-node execution #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prototype expansion of SQL transforms for single-node execution #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions