-
Notifications
You must be signed in to change notification settings - Fork 17
Making Spark Connect Rust more thread-safe friendly #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for creating this PR! I played around with something similar when I was rewriting the client implementation. I just didn't like having the user wrap the SparkSession with Do you know of way to allow for friendlier |
Will look into it. I think the biggest argument for Arc is to have a cheap
copy rather than an expensive one. What do you think? I will look into
DataFusion.
…On Tue, Apr 16, 2024 at 10:42 AM Steve Russo ***@***.***> wrote:
Thanks for creating this PR! I played around with something similar when I
was rewriting the client implementation. I just didn't like having the user
wrap the SparkSession with Arc to create the session. It felt clunky, but
that is probably me over thinking the interface.
Do you know of way to allow for friendlier Send and Sync without
requiring the user to create Arc? I was loosely mirroring the
SessionContext from DataFusion
<https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html>
as for how users of a similar *DataFrame* library interact with a session
object. The SessionContext does not get wrapped in Arc.
—
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGGOKN5OEB7S7F4I6AQLALY5U2GBAVCNFSM6AAAAABGHLMBJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJZGI3DKMZXGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Add setCatalog and setDatabase to SparkSession
@@ -56,6 +56,34 @@ impl SparkSession { | |||
DataFrame::new(self, LogicalPlanBuilder::from(range_relation)) | |||
} | |||
|
|||
pub fn setCatalog(self: Arc<Self>, catalog: &str) -> DataFrame { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for both setCatalog
and setDatabase
, these should be implemented on the spark.catalog
object as setCurrentCatalog
and setCurrentDatabase
since we want to mirror the existing Spark API.
For these actions to take effect on the existing session, the plan has to be submitted to the server via client.execute_and_fetch
and receive a successful response. Both of these execution plans return nothing from the server. The code might look like this below
pub async fn setCurrentCatalog(self, catalog: &str) -> Result<(), SparkError> {
let cat_type = Some(spark::catalog::CatType::SetCurrentCatalog(
spark::SetCurrentCatalog { catalog_name: catalog.to_string() },
));
let rel_type = spark::relation::RelType::Catalog(spark::Catalog { cat_type });
let plan = LogicalPlanBuilder::plan_root(LogicalPlanBuilder::from(rel_type));
self.spark_session.client().execute_and_fetch(plan).await
}
Description
Enabling send_guard and using ARC makes the crate much more friendly to usage in multi-threaded environment, making the session
Send
andSync