Skip to content

[BUG]:When returning a StringDataFrameColumn from the apply method, a java.lang.IllegalArgumentException is thrown #1231

@wzl-bxg

Description

@wzl-bxg

Describe the bug
When returning a StringDataFrameColumn from the apply method, a java.lang.IllegalArgumentException is thrown.

To Reproduce

Steps to reproduce the behavior:
1.Implement an apply method that returns a StringDataFrameColumn.
2. Execute the code that triggers the apply method.
3. Observe that a java.lang.IllegalArgumentException is thrown at runtime

Minimal reproducible example:

  • Spark version: 3.3.4
  • Microsoft.Spark version: 2.3.0
  • Microsoft.Spark.Worker: Microsoft.Spark.Worker.net8.0.win-x64-2.3.0

using Microsoft.Data.Analysis;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Types;
using FxDataFrame = Microsoft.Data.Analysis.DataFrame;

namespace MySparkApp
{
class GroupApply
{
static void Main(string[] args)
{
SparkSession spark =
SparkSession
.Builder()
.AppName("group_apply_sample")
.GetOrCreate();

        var df = spark.CreateDataFrame(new[]
        {
            new GenericRow(new object[] { 10, 2, 5.0, "this is description one" }),
            new GenericRow(new object[] { 150, 10, 1.0, "this is description two" }),
            new GenericRow(new object[] { 150, 3, 1.0 , "this is description three"}),
        },
        new StructType(new[]
        {
            new StructField("ProductId", new IntegerType()),
            new StructField("Qty", new IntegerType()),
            new StructField("Price", new DoubleType()),
            new StructField("Description", new StringType())
        }));

        var returnSchema = new StructType(new[]
        {
            new StructField("Description", new StringType())
        });

        var result = df
            .GroupBy("ProductId")
            .Apply(returnSchema, dataFrame =>
            {
                var descriptions = new List<string>();

                foreach (var row in dataFrame.Rows)
                {
                    int productId = (int)row["ProductId"];
                    string description = (string)row["Description"];

                    if (productId > 10)
                    {
                        descriptions.Add(description + productId);
                    }
                }

                var descriptionColumn = new StringDataFrameColumn("Description", descriptions);

                var resultFrame = new FxDataFrame(descriptionColumn);

                return resultFrame;
            });

        result.Show();

        spark.Stop();
    }
}

}

Expected behavior
The apply method should successfully return a StringDataFrameColumn without throwing an exception.

Screenshots
Image

Desktop (please complete the following information):

  • OS: [Windows 10]
  • Spark Version [3.3.4]
  • Microsoft.Spark Version [2.3.0]
  • Microsoft.Spark.Worker [Microsoft.Spark.Worker.net8.0.win-x64-2.3.0]

Additional context
This issue only occurs when the return type of the apply method is StringDataFrameColumn. Returning Int32DataFrameColumn and DoubleDataFrameColumn types will work as expected

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions