Refactor IcebergPartitionTransform#141
Conversation
| if (minText != NULL && maxText != NULL) | ||
| { | ||
| *names = lappend(*names, colName); | ||
| *names = lappend(*names, pstrdup(colName)); |
There was a problem hiding this comment.
unrelated warning fix
There was a problem hiding this comment.
colname comes from TextDatumGetCString, which does pstrdup, so seems confusing to do another pstrdup.
Can't we cast to non-const for avoiding the warning?
374d30a to
a080db8
Compare
sfc-gh-okalaci
left a comment
There was a problem hiding this comment.
Thanks, I think this looks good in principle, but we should perhaps defer merging this after the tag, let's not add any optional changes during the release testing period.
| if (minText != NULL && maxText != NULL) | ||
| { | ||
| *names = lappend(*names, colName); | ||
| *names = lappend(*names, pstrdup(colName)); |
There was a problem hiding this comment.
colname comes from TextDatumGetCString, which does pstrdup, so seems confusing to do another pstrdup.
Can't we cast to non-const for avoiding the warning?
|
|
||
| /* transform name, e.g. bucket[3] */ | ||
| const char *transformName; | ||
| IcebergPartitionSpecField *specField; |
There was a problem hiding this comment.
hm, now we are introducing some edge cases when specField == NULL?
Can we use IcebergPartitionSpecField specField; instead?
|
@sfc-gh-abozkurt can you please rebase? |
IcebergPartitionTransform can embed a IcebergPartitionSpecField by removing some individual fields, which makes it less verbose. Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
0cbf801 to
a905847
Compare
sfc-gh-okalaci
left a comment
There was a problem hiding this comment.
I still think that switching from DataFileSchemaField *sourceField; to DataFileSchemaField sourceField; (same for other struct) is safer. We are passing these structs around, and it'd be hard to track the memory ownership.
So, let's make the copying safer.
Does that make sense to you as well?
But then we should do that properly.
| transform->partitionFieldId = specField->field_id; | ||
| transform->partitionFieldName = pstrdup(specField->name); | ||
| transform->transformName = pstrdup(specField->transform); | ||
| transform->specField = *specField; |
There was a problem hiding this comment.
with the change from IcebergPartitionSpecField *specField to IcebergPartitionSpecField specField -- sorry I suggested that -- this is now a shallow copy.
IcebergPartitionSpecField has a source_ids pointer, so after this assignment both the original and the copy point to the same underlying array. Ownership becomes unclear and we could end up freeing it in multiple places.
We should perhaps add a DeepCopyIcebergPartitionSpecField() helper and use it here (and anywhere else we do this kind of assignment).
| transform->sourceField = GetDataFileSchemaFieldById(schema, specField->source_id); | ||
| DataFileSchemaField *sourceField = GetDataFileSchemaFieldById(schema, specField->source_id); | ||
|
|
||
| transform->sourceField = *sourceField; |
There was a problem hiding this comment.
Same shallow copy issue here - DataFileSchemaField has several pointer members (name, type, etc.). After this assignment, both point to the same underlying strings/structs.
We could add a DeepCopyDataFileSchemaField() helper (there's already DeepCopyField() in field.h we could leverage).
937c534 to
8f41a5b
Compare
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
8f41a5b to
d7db991
Compare
|
@sfc-gh-abozkurt @sfc-gh-okalaci is this something that is worth getting rebased and into the next release? |
no but it might be good to get #173 in. |
Separate ParsedIcebergPartitionTransform (parser step) and IcebergPartitionTransform (analyzer step).