Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
This repository was archived by the owner on Sep 2, 2025. It is now read-only.

How to connect to a local spark install #31

@aaronsteers

Description

@aaronsteers

Is there any way to connect using a locally installed spark instance, rather than to a remote service via http/thrift?

The code I'm trying to migrate uses the following imports to run SQL-based transforms locally using spark/hive already on the container:

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.config(conf=conf)
    .master("local")
    .appName("My Spark App")
    .enableHiveSupport()
    .getOrCreate()
)
spark.sparkContext.setLogLevel(SPARK_LOG_LEVEL)
sc = spark.sparkContext

# ...

df = spark.sql(f"CREATE TABLE AS SELECT * FROM {my_source_table}")

And if not supported currently, is there any chance we could build this and/or add the feature? For CI/CD pipelines especially, it seems we would want to be able to run dbt pipelines even without access to an external cluster.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions