Skip to content

Migrate from Spark's typesystem to Substrait's typesystem (SparkSchema, SparkPrimitive). This will likely require providing a translation layer for serialized types. #325

@okennedy

Description

@okennedy

Goals

Phase 0: Rebase off of Substrait-Scala

  • Rip out Mimir for the time being
  • Migrate to substrait-scala

Phase 1a: ExecutionContext

  • Modify ExecutionContext to support the creation of Substrait based artifacts. Substrait's standard protobuf-based encoder should work for storage, and we can use a new MIME type to distinguish Substrait-based Datasets.
  • Modify ExecutionContext and Artifact to allow spark dataframe methods to work with Substrait-based plans.
  • Modify ExecutionContext to allow DataframeConstructor-based artifacts to be retrieved as Substrait plans

Phase 1b: QueryExecutor

  • Add a new QueryExecutor trait / object that accepts a Substrait plan and executes it, producing results in some standard format (Array of Row?)

Phase 2: Migration

  • Rewrite existing Vizier Commands to use substrait-based ExecutionContext operations
  • Rewrite the Vizier spreadsheet to be based on Substrait
  • Rewrite any remaining Vizier code to replace SparkPrimitive/SparkSchema references with the corresponding Substrait types

Phase 3: Extract Logic to Plugins

  • Factor the Spark-specific code out into a plugin
  • Update ExecutionContext, Artifact, and any other code to remove all Spark-specific operations
  • Add a default executor based on SQLite (or DuckDB?)
  • Factor the Mimir-specific code out into a plugin

Visualizations

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlayer-apiAn issue involving the vizier API layerlayer-scalaAn issue involving Scala compatibility code

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions