You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 12, 2026. It is now read-only.
One of the main targets for the Ray Beam Runner is to support SQL (and streaming SQL).
Beam's SQL support is implemented in Java. There are two parts for the execution of SQL transforms in Beam:
Expansion: The way Beam implements expansion of multi-language transforms is by implementing an ExpansionService interface (sample of the GRPC implementation - this seems way too complicated to be honest)
My idea:
Implement a class "RayJavaExpansionService" - that receives the expansion request that can be a relatively simple thing. It must contain:
Note: I will implement a Sql one: SqlSchemaTransformProvider with id "beam:schematransform:org.apache.beam:sql:v1" this week.
Parameters for the transform (in this case, just the SQL statement)
The RayJavaExpansionService should then return the schema of the resulting PCollection, as well as the expanded graph of operations in protobuf format (the proto format).
Java dependencies:
"org.apache.beam:beam-sdks-java-core"
"org.apache.beam:beam-sdks-java-extensions-sql"
The expansion is not enough to execute SQL, but it's the first step. The next step is to recognize Java Stages, and execute them in a Java process rather than a Python process (basically, a Java implementation of this code, where we return some kind of JavaWorkerHandler