- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13
 
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
Version: [email protected] (also tested and noticed the issue in 17.0.0)
I have a use case where I have an arrow file that has two columns with the same name, but different types. In this example, there are two "id" columns, but one is an Int while the other is a string.
When I load this arrow file using tableFromIPC, the table schema marks both columns as strings.
Simple base64 arrow file used in this example:
/////7ABAAAQAAAAAAAKAA4ABgANAAgACgAAAAAABAAQAAAAAAEKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAIAAADAAAAABAAAAFr///8UAAAAiAAAAIwAAAAAAAAFiAAAAAIAAABAAAAABAAAABT///8IAAAAFAAAAAgAAAAic3RyaW5nIgAAAAAXAAAAU3Bhcms6RGF0YVR5cGU6SnNvblR5cGUATP///wgAAAAQAAAABgAAAFNUUklORwAAFgAAAFNwYXJrOkRhdGFUeXBlOlNxbE5hbWUAAAAAAAAEAAQABAAAAAIAAABpZAAAAAASABgAFAAAABMADAAAAAgABAASAAAAFAAAAIwAAACUAAAAAAAAApgAAAACAAAARAAAAAQAAADM////CAAAABAAAAAGAAAAImxvbmciAAAXAAAAU3Bhcms6RGF0YVR5cGU6SnNvblR5cGUACAAMAAgABAAIAAAACAAAABAAAAAGAAAAQklHSU5UAAAWAAAAU3Bhcms6RGF0YVR5cGU6U3FsTmFtZQAAAAAAAAgADAAIAAcACAAAAAAAAAFAAAAAAgAAAGlkAAD/////yAAAABQAAAAAAAAADAAWAAYABQAIAAwADAAAAAADBAAYAAAAKAAAAAAAAAAAAAoAGAAMAAQACAAKAAAAbAAAABAAAAABAAAAAAAAAAAAAAAFAAAAAAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAQAAAAAAAAAAEAAAAAAAAAGAAAAAAAAAAIAAAAAAAAACAAAAAAAAAAAgAAAAAAAAAAAAAAAgAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAIAAAAxMAAAAAAAAP////8AAAAA
Below are the details of the Table object returned by tableFromIPC. Notice how in the schema, both id columns are marked as Utf8, but the column details show the difference in column types.
=== Arrow Table Information ===
Rows: 1
Columns: 2
Schema: Schema<{ 0: id: Utf8, 1: id: Utf8 }>
=== Table Contents ===
[
  {"id": 0, "id": "10"}
]
=== Column Details ===
Column 0: undefined (Int64)
Column 1: undefined (Utf8)
Code to generate the above result
const arrowBuffer = base64ToUint8Array(base64String);
// Parse the Arrow IPC data into a table
const table = tableFromIPC(arrowBuffer);
console.log('\n=== Arrow Table Information ===');
console.log(`Rows: ${table.numRows}`);
console.log(`Columns: ${table.numCols}`);
console.log(`Schema: ${table.schema}`);
console.log('\n=== Table Contents ===');
console.log(table.toString());
console.log('\n=== Column Details ===');
for (let i = 0; i < table.numCols; i++) {
  const column = table.getChildAt(i);
  console.log(`Column ${i}: ${column.name} (${column.type})`);
}I have tested this same arrow file with pyarrow, and it shows the expected result:
Schema:
 id: int64 not null
  -- field metadata --
  Spark:DataType:SqlName: 'BIGINT'
  Spark:DataType:JsonType: '"long"'
id: string not null
  -- field metadata --
  Spark:DataType:SqlName: 'STRING'
  Spark:DataType:JsonType: '"string"'
Batch 0:
pyarrow.RecordBatch
id: int64 not null
id: string not null
----
id: [0]
id: ["10"]
Metadata
Metadata
Assignees
Labels
No labels