Open
Description
Describe the bug
When running DataFrame.Collect()
a long value in the dataset (even with a proper schema set to LongType
) can be collected as an int
.
To Reproduce
This reproduces the issue:
long value = 1;
string fieldName = "SomeLongValue";
var df = _spark.CreateDataFrame(
new[]
{
new GenericRow(new object[] { value } ),
},
new StructType(new [] { new StructField(fieldName, new LongType(), false)})
);
var collected = df.Collect().First().Get(fieldName);
Assert.Equal(value, collected); // this fails since `value` is a long whereas `collected` is an int
On the other hand, this test would run fine:
long value = long.MaxValue;
string fieldName = "SomeLongValue";
var df = _spark.CreateDataFrame(
new[]
{
new GenericRow(new object[] { value } ),
},
new StructType(new [] { new StructField(fieldName, new LongType(), false)})
);
var collected = df.Collect().First().Get(fieldName);
Assert.Equal(value, collected); // this works
It might come from Unpickling or from Spark -> Spark-Python, I have no idea.
Expected behavior
When the Schema specifies a column is of LongType
, the collected value should be a long
Desktop (please complete the following information):
- OS: Windows 11
- Version: 2.1.1