You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
Is there any way to connect using a locally installed spark instance, rather than to a remote service via http/thrift?
The code I'm trying to migrate uses the following imports to run SQL-based transforms locally using spark/hive already on the container:
from pyspark.sql import SparkSession
spark = (
SparkSession.builder.config(conf=conf)
.master("local")
.appName("My Spark App")
.enableHiveSupport()
.getOrCreate()
)
spark.sparkContext.setLogLevel(SPARK_LOG_LEVEL)
sc = spark.sparkContext
# ...
df = spark.sql(f"CREATE TABLE AS SELECT * FROM {my_source_table}")
And if not supported currently, is there any chance we could build this and/or add the feature? For CI/CD pipelines especially, it seems we would want to be able to run dbt pipelines even without access to an external cluster.