Making Spark Connect Rust more thread-safe friendly #13

edmondop · 2024-04-15T13:21:10Z

Description

Enabling send_guard and using ARC makes the crate much more friendly to usage in multi-threaded environment, making the session Send and Sync

sjrusso8 · 2024-04-16T14:42:17Z

Thanks for creating this PR! I played around with something similar when I was rewriting the client implementation. I just didn't like having the user wrap the SparkSession with Arc to create the session. It felt clunky, but that is probably me over thinking the interface.

Do you know of way to allow for friendlier Send and Sync without requiring the user to create Arc? I was loosely mirroring the SessionContext from DataFusion as for how users of a similar DataFrame library interact with a session object. The SessionContext does not get wrapped in Arc.

edmondop · 2024-04-16T19:16:54Z

Will look into it. I think the biggest argument for Arc is to have a cheap copy rather than an expensive one. What do you think? I will look into DataFusion.

…

On Tue, Apr 16, 2024 at 10:42 AM Steve Russo ***@***.***> wrote: Thanks for creating this PR! I played around with something similar when I was rewriting the client implementation. I just didn't like having the user wrap the SparkSession with Arc to create the session. It felt clunky, but that is probably me over thinking the interface. Do you know of way to allow for friendlier Send and Sync without requiring the user to create Arc? I was loosely mirroring the SessionContext from DataFusion <https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html> as for how users of a similar *DataFrame* library interact with a session object. The SessionContext does not get wrapped in Arc. — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGGOKN5OEB7S7F4I6AQLALY5U2GBAVCNFSM6AAAAABGHLMBJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJZGI3DKMZXGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Add setCatalog and setDatabase to SparkSession

sjrusso8 · 2024-04-19T14:10:04Z

src/session.rs

@@ -56,6 +56,34 @@ impl SparkSession {
        DataFrame::new(self, LogicalPlanBuilder::from(range_relation))
    }

+    pub fn setCatalog(self: Arc<Self>, catalog: &str) -> DataFrame {


So for both setCatalog and setDatabase, these should be implemented on the spark.catalog object as setCurrentCatalog and setCurrentDatabasesince we want to mirror the existing Spark API.

For these actions to take effect on the existing session, the plan has to be submitted to the server via client.execute_and_fetch and receive a successful response. Both of these execution plans return nothing from the server. The code might look like this below

pub async fn setCurrentCatalog(self, catalog: &str) -> Result<(), SparkError> { let cat_type = Some(spark::catalog::CatType::SetCurrentCatalog( spark::SetCurrentCatalog { catalog_name: catalog.to_string() }, )); let rel_type = spark::relation::RelType::Catalog(spark::Catalog { cat_type }); let plan = LogicalPlanBuilder::plan_root(LogicalPlanBuilder::from(rel_type)); self.spark_session.client().execute_and_fetch(plan).await }

edmondop added 5 commits April 14, 2024 22:35

Wrapping into Arc, plus other

6f16899

Reverting arrow change

fe53515

Revert dummy print

1483bcf

Updated examples

d1e656b

Fixing format

cff3e9b

phillipleblanc and others added 2 commits April 17, 2024 18:14

Add setCatalog and setDatabase to SparkSession

ed5c679

Merge pull request #2 from phillipleblanc/phillip/240417-setCatalog

24b8a43

Add setCatalog and setDatabase to SparkSession

sjrusso8 reviewed Apr 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making Spark Connect Rust more thread-safe friendly #13

Making Spark Connect Rust more thread-safe friendly #13

Uh oh!

edmondop commented Apr 15, 2024

Uh oh!

sjrusso8 commented Apr 16, 2024

Uh oh!

edmondop commented Apr 16, 2024 via email

Uh oh!

sjrusso8 Apr 19, 2024

Uh oh!

Uh oh!

Making Spark Connect Rust more thread-safe friendly #13

Are you sure you want to change the base?

Making Spark Connect Rust more thread-safe friendly #13

Uh oh!

Conversation

edmondop commented Apr 15, 2024

Description

Uh oh!

sjrusso8 commented Apr 16, 2024

Uh oh!

edmondop commented Apr 16, 2024 via email

Uh oh!

sjrusso8 Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!