How to connect to a local spark install

Is there any way to connect using a locally installed spark instance, rather than to a remote service via http/thrift?

The code I'm trying to migrate uses the following imports to run SQL-based transforms locally using spark/hive already on the container:

    from pyspark.sql import SparkSession
    
    spark = (
        SparkSession.builder.config(conf=conf)
        .master("local")
        .appName("My Spark App")
        .enableHiveSupport()
        .getOrCreate()
    )
    spark.sparkContext.setLogLevel(SPARK_LOG_LEVEL)
    sc = spark.sparkContext
    
    # ...
    
    df = spark.sql(f"CREATE TABLE AS SELECT * FROM {my_source_table}")
    

And if not supported currently, is there any chance we could build this and/or add the feature? For CI/CD pipelines especially, it seems we would want to be able to run dbt pipelines even without access to an external cluster.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to connect to a local spark install #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to connect to a local spark install #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions