Skip to content

[BUG]: When collected, long values are cast to int #1155

Open
@aesteve

Description

@aesteve

Describe the bug
When running DataFrame.Collect() a long value in the dataset (even with a proper schema set to LongType) can be collected as an int.

To Reproduce

This reproduces the issue:

        long value = 1;
        string fieldName = "SomeLongValue"; 
        var df = _spark.CreateDataFrame(
            new[]
            {
                new GenericRow(new object[] { value } ),
            },
            new StructType(new [] { new StructField(fieldName, new LongType(), false)})
        );
        var collected = df.Collect().First().Get(fieldName);
        Assert.Equal(value, collected); // this fails since `value` is a long whereas `collected` is an int

On the other hand, this test would run fine:

        long value = long.MaxValue;
        string fieldName = "SomeLongValue"; 
        var df = _spark.CreateDataFrame(
            new[]
            {
                new GenericRow(new object[] { value } ),
            },
            new StructType(new [] { new StructField(fieldName, new LongType(), false)})
        );
        var collected = df.Collect().First().Get(fieldName);
        Assert.Equal(value, collected); // this works

It might come from Unpickling or from Spark -> Spark-Python, I have no idea.

Expected behavior

When the Schema specifies a column is of LongType, the collected value should be a long

Desktop (please complete the following information):

  • OS: Windows 11
  • Version: 2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions