Skip to content

Refactor IcebergPartitionTransform#141

Open
sfc-gh-abozkurt wants to merge 3 commits into
mainfrom
aykut/refactor-transform
Open

Refactor IcebergPartitionTransform#141
sfc-gh-abozkurt wants to merge 3 commits into
mainfrom
aykut/refactor-transform

Conversation

@sfc-gh-abozkurt
Copy link
Copy Markdown
Collaborator

@sfc-gh-abozkurt sfc-gh-abozkurt commented Jan 9, 2026

Separate ParsedIcebergPartitionTransform (parser step) and IcebergPartitionTransform (analyzer step).

if (minText != NULL && maxText != NULL)
{
*names = lappend(*names, colName);
*names = lappend(*names, pstrdup(colName));
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated warning fix

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colname comes from TextDatumGetCString, which does pstrdup, so seems confusing to do another pstrdup.

Can't we cast to non-const for avoiding the warning?

Copy link
Copy Markdown
Collaborator

@sfc-gh-okalaci sfc-gh-okalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this looks good in principle, but we should perhaps defer merging this after the tag, let's not add any optional changes during the release testing period.

if (minText != NULL && maxText != NULL)
{
*names = lappend(*names, colName);
*names = lappend(*names, pstrdup(colName));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colname comes from TextDatumGetCString, which does pstrdup, so seems confusing to do another pstrdup.

Can't we cast to non-const for avoiding the warning?


/* transform name, e.g. bucket[3] */
const char *transformName;
IcebergPartitionSpecField *specField;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, now we are introducing some edge cases when specField == NULL?

Can we use IcebergPartitionSpecField specField; instead?

@sfc-gh-okalaci
Copy link
Copy Markdown
Collaborator

@sfc-gh-abozkurt can you please rebase?

IcebergPartitionTransform can embed a IcebergPartitionSpecField by removing
some individual fields, which makes it less verbose.

Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
Copy link
Copy Markdown
Collaborator

@sfc-gh-okalaci sfc-gh-okalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that switching from DataFileSchemaField *sourceField; to DataFileSchemaField sourceField; (same for other struct) is safer. We are passing these structs around, and it'd be hard to track the memory ownership.

So, let's make the copying safer.

Does that make sense to you as well?

But then we should do that properly.

transform->partitionFieldId = specField->field_id;
transform->partitionFieldName = pstrdup(specField->name);
transform->transformName = pstrdup(specField->transform);
transform->specField = *specField;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the change from IcebergPartitionSpecField *specField to IcebergPartitionSpecField specField -- sorry I suggested that -- this is now a shallow copy.

IcebergPartitionSpecField has a source_ids pointer, so after this assignment both the original and the copy point to the same underlying array. Ownership becomes unclear and we could end up freeing it in multiple places.

We should perhaps add a DeepCopyIcebergPartitionSpecField() helper and use it here (and anywhere else we do this kind of assignment).

transform->sourceField = GetDataFileSchemaFieldById(schema, specField->source_id);
DataFileSchemaField *sourceField = GetDataFileSchemaFieldById(schema, specField->source_id);

transform->sourceField = *sourceField;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same shallow copy issue here - DataFileSchemaField has several pointer members (name, type, etc.). After this assignment, both point to the same underlying strings/structs.
We could add a DeepCopyDataFileSchemaField() helper (there's already DeepCopyField() in field.h we could leverage).

@sfc-gh-abozkurt sfc-gh-abozkurt force-pushed the aykut/refactor-transform branch from 937c534 to 8f41a5b Compare January 26, 2026 13:02
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
@sfc-gh-dachristensen
Copy link
Copy Markdown
Collaborator

@sfc-gh-abozkurt @sfc-gh-okalaci is this something that is worth getting rebased and into the next release?

@sfc-gh-abozkurt
Copy link
Copy Markdown
Collaborator Author

@sfc-gh-abozkurt @sfc-gh-okalaci is this something that is worth getting rebased and into the next release?

no but it might be good to get #173 in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants