Skip to content

GetConnectionId and expose the BindData struct in the scalar UDF executor #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 28, 2025

Conversation

taniabogatsch
Copy link
Collaborator

Expose GetConnectionId:

// GetConnectionId returns the connection ID of the internal DuckDB connection.
// It expects a *sql.Conn connection.
func GetConnectionId(c *sql.Conn) (uint64, error)

Adds a new struct BindData, which contains data DuckDB, and in the future also clients, can set during scalar function binding. Currently, this only contains the connection ID. Together with GetConnectionId, this enables users to map their scalar functions to the connection that executes the scalar function.

I'm opening this as a draft PR, as there are still a few unresolved points. Also, to make arbitrary binding data available during execution, I need to PR more C API functions to DuckDB main.

Questions:

  • Currently, go-duckdb does not keep a mapping of its open connections. Also, they do not yet contain a state as proposed in Proof of concept for passing query time context to UDF functions #418. Should clients keep such a mapping, and use only the connection ID to retrieve the matching connection? Should we make the context available, in Conn or even BindData?
  • I've extended type ScalarFuncExecutor struct with another field allowing a new type of executor. This should be in line with Go's compatibility rules (https://go.dev/blog/module-compatibility) for structs. But please let me know if I am missing something here.

cc @VGSML @lorenzowritescode

@taniabogatsch taniabogatsch added the feature / enhancement Code improvements or a new feature label May 26, 2025
@VGSML
Copy link
Contributor

VGSML commented May 26, 2025

Thanks a lot for the PR! It looks like it would be simpler to pass context.Context directly to the UDF instead of BindData. However, in that case, we would need to implement logic to store the Go context at the duckdb.Conn (duckdb.Connector) level.
I can propose a solution — could you create a separate branch here where I can propose my changes the top of yours (create PR)?

Also, we need to implement the same logic to retrieve the connection ID from the table UDF bind info.

@VGSML
Copy link
Contributor

VGSML commented May 27, 2025

I have prepared proposal based on your branch taniabogatsch#3

The user data can be passed through context, as a normal go way.

The big limitation is when user make concurrent queries in one connection, the context in UDF will be from the last query

Could you take a look at?

@taniabogatsch
Copy link
Collaborator Author

EDIT: I was writing this before you submitted your PR to my fork - will check that out now (thanks for the input!) :)

It looks like it would be simpler to pass context.Context directly to the UDF instead of BindData.

What I like about having something like BindData is that its a more explicit way to allow users to pass arbitrary bind data to the execution (in the future). Although I guess that's also possible if we make the context available both during binding and execution. 🤔 I agree that the current way BindData looks in this PR should not be the final solution. Maybe it could contain a key-value map, which could then also contain the context?

Here's a bit more information on how I envision an (optional) bind callback in the future, once we have this PR available: duckdb/duckdb#17666. For example, internally, during bind, we can have the input parameters available as Expression.

typedef unique_ptr<FunctionData> (*bind_scalar_function_t)(ClientContext &context, ScalarFunction &bound_function,
                                                           vector<unique_ptr<Expression>> &arguments);

In the future, we can expose these in the C API, and make it possible to, e.g., have access to a constant expression value, or a return type. With that knowledge, the client can then store additional information (necessary for execution) in the BindData.

Also, we need to implement the same logic to retrieve the connection ID from the table UDF bind info.

That's a good point, I haven't checked if we already have the necessary C API functions in place for that.

@VGSML
Copy link
Contributor

VGSML commented May 27, 2025

My last comment was at the same time with yours))

@taniabogatsch
Copy link
Collaborator Author

I just had a quick look at your PR, it's pretty neat! I'll reiterate on it later today, and merge it + update this draft, after I've played around with your changes a bit.

Copy link

@lorenzowritescode lorenzowritescode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @taniabogatsch!

@taniabogatsch
Copy link
Collaborator Author

taniabogatsch commented May 27, 2025

Thanks both of you for your thorough review feedback! 💪 I've implemented all suggestions, and left a comment w.r.t. the bindInfo. Still struggling with the linter, but once the tests pass this is ready to merge from my side.

EDIT: I just saw the comments - will not merge this yet.

@taniabogatsch taniabogatsch marked this pull request as ready for review May 27, 2025 12:35
@VGSML
Copy link
Contributor

VGSML commented May 27, 2025

all good, you can merge it

@lorenzowritescode
Copy link

feel free to merge when you're ready :)

@VGSML
Copy link
Contributor

VGSML commented May 27, 2025 via email

@taniabogatsch taniabogatsch merged commit 15b7e3c into marcboeker:main May 28, 2025
31 checks passed
@taniabogatsch taniabogatsch deleted the bind-info branch May 28, 2025 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature / enhancement Code improvements or a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants