Skip to content

add support of init_spark from existing SparkSession? #416

@zhh210

Description

@zhh210

Is it possible to initialize the spark object from an existing SparkSession? The use case is that my work environment needs a special customized SparkSession that were wrapped up with complicated corporate credentials and setups. Running init_spark() from the raydp example won't work as it is not aware of them. I can create a SparkSession object using the customized wrapper though but don't know how I can pass it over to raydp.

The raydp example using standard spark:

import ray
import raydp

# connect to ray cluster
ray.init(address='auto')

# create a Spark cluster with specified resource requirements
spark = raydp.init_spark(app_name='RayDP Example',
                         num_executors=2,
                         executor_cores=2,
                         executor_memory='4GB')

# normal data processesing with Spark
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
df.show()
word_count = df.groupBy('word').count()
word_count.show()

# stop the spark cluster
raydp.stop_spark()

Proposed raydp using existing SparkSession:

spark_session = get_customized_ss()
spark = spark_init(spark_session)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions