Skip to content
This repository was archived by the owner on Jan 12, 2026. It is now read-only.
This repository was archived by the owner on Jan 12, 2026. It is now read-only.

Prototype expansion of SQL transforms for single-node execution #59

@pabloem

Description

@pabloem

One of the main targets for the Ray Beam Runner is to support SQL (and streaming SQL).

Beam's SQL support is implemented in Java. There are two parts for the execution of SQL transforms in Beam:

  • Expansion: The way Beam implements expansion of multi-language transforms is by implementing an ExpansionService interface (sample of the GRPC implementation - this seems way too complicated to be honest)

My idea:

  • Implement a class "RayJavaExpansionService" - that receives the expansion request that can be a relatively simple thing. It must contain:
    • Schema of the Input PCollection (what are schemas)
    • Identifier of the transform to apply (these ideantifiers are provided by SchemaTransformProvider implementations (see a few examples)
      • Note: I will implement a Sql one: SqlSchemaTransformProvider with id "beam:schematransform:org.apache.beam:sql:v1" this week.
    • Parameters for the transform (in this case, just the SQL statement)

The RayJavaExpansionService should then return the schema of the resulting PCollection, as well as the expanded graph of operations in protobuf format (the proto format).

  • Java dependencies:
    • "org.apache.beam:beam-sdks-java-core"
    • "org.apache.beam:beam-sdks-java-extensions-sql"

The expansion is not enough to execute SQL, but it's the first step. The next step is to recognize Java Stages, and execute them in a Java process rather than a Python process (basically, a Java implementation of this code, where we return some kind of JavaWorkerHandler

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions