Skip to content

fix: Don't nullify release callback for schemas#6

Merged
peasee merged 6 commits into
duckdb-1.3.2from
peasee-patch-2
Oct 21, 2025
Merged

fix: Don't nullify release callback for schemas#6
peasee merged 6 commits into
duckdb-1.3.2from
peasee-patch-2

Conversation

@peasee
Copy link
Copy Markdown

@peasee peasee commented Oct 21, 2025

🗣 Description

  • Ensures that the release function of arrow schemas are not nullified when returned from the FactoryGetSchema CAPI factory. This ensures that, when bound to an ArrowSchemaWrapper, the schema is released when destructed. Previously, the schema would never be released because the release function was overridden.
    • I am not sure why it was implemented this way originally. Perhaps some workaround to make duckdb_arrow_array_scan work. The Arrow C Stream Specification defines that schema lifetimes can be independent from the stream, and can be released individually, so I am not sure what the original comment was referring to.
    • duckdb-rs could never release the schema itself even if it wanted too, because it never had a handle to the schema to begin with. The schema was only ever retrieved by duckdb code, through the FactoryGetSchema function from the stream.
  • Releases the schema from PrivateData to ensure the schema is released when the PrivateData is deleted as it stores the direct arrow schema. Destroying a PrivateData without releasing the schema would result in a leak.
  • Adds a naive check for null pointers when checking children schema of already freed schemas.

This PR breaks the functionality of duckdb_arrow_array_scan:

  • duckdb_arrow_array_scan does some clobbering of the release functions and lifecycles, and it results in the arrow schema being freed when the ArrowSchemaWrapper is destroyed. The GetSchema function that this stream implements does not create a new instance of an arrow schema, but it instead returns a pointer to an existing arrow schema which the user supplies as a function input.
  • Because it does not create a new instance of the schema on each GetSchema call, when the ArrowSchemaWrapper is destroyed it results in the underlying shared schema being released. The next GetSchema call will then return the same pointer to the released schema, whose memory doesn't exist anymore.
  • The impact of this is that duckdb_arrow_array_scan now segfaults, captured by the failing tests. We'll merge this as-is though, as datafusion-table-providers does not use duckdb_arrow_array_scan. A correct fix would be updating the GetSchema factory for duckdb_arrow_array_scan to somehow return a clone of the original schema, but persist a release function so these clones get released.

@peasee peasee self-assigned this Oct 21, 2025
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a formatting change

@peasee peasee requested a review from phillipleblanc October 21, 2025 06:02
@peasee peasee merged commit c6320d3 into duckdb-1.3.2 Oct 21, 2025
28 of 38 checks passed
mach-kernel pushed a commit that referenced this pull request Oct 28, 2025
fix: Don't nullify release callback for schemas
mach-kernel pushed a commit that referenced this pull request Oct 28, 2025
fix: Don't nullify release callback for schemas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants