-
Notifications
You must be signed in to change notification settings - Fork 331
Description
Describe the bug
When returning a StringDataFrameColumn from the apply method, a java.lang.IllegalArgumentException is thrown.
To Reproduce
Steps to reproduce the behavior:
1.Implement an apply method that returns a StringDataFrameColumn.
2. Execute the code that triggers the apply method.
3. Observe that a java.lang.IllegalArgumentException is thrown at runtime
Minimal reproducible example:
- Spark version: 3.3.4
- Microsoft.Spark version: 2.3.0
- Microsoft.Spark.Worker: Microsoft.Spark.Worker.net8.0.win-x64-2.3.0
using Microsoft.Data.Analysis;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Types;
using FxDataFrame = Microsoft.Data.Analysis.DataFrame;
namespace MySparkApp
{
class GroupApply
{
static void Main(string[] args)
{
SparkSession spark =
SparkSession
.Builder()
.AppName("group_apply_sample")
.GetOrCreate();
var df = spark.CreateDataFrame(new[]
{
new GenericRow(new object[] { 10, 2, 5.0, "this is description one" }),
new GenericRow(new object[] { 150, 10, 1.0, "this is description two" }),
new GenericRow(new object[] { 150, 3, 1.0 , "this is description three"}),
},
new StructType(new[]
{
new StructField("ProductId", new IntegerType()),
new StructField("Qty", new IntegerType()),
new StructField("Price", new DoubleType()),
new StructField("Description", new StringType())
}));
var returnSchema = new StructType(new[]
{
new StructField("Description", new StringType())
});
var result = df
.GroupBy("ProductId")
.Apply(returnSchema, dataFrame =>
{
var descriptions = new List<string>();
foreach (var row in dataFrame.Rows)
{
int productId = (int)row["ProductId"];
string description = (string)row["Description"];
if (productId > 10)
{
descriptions.Add(description + productId);
}
}
var descriptionColumn = new StringDataFrameColumn("Description", descriptions);
var resultFrame = new FxDataFrame(descriptionColumn);
return resultFrame;
});
result.Show();
spark.Stop();
}
}
}
Expected behavior
The apply method should successfully return a StringDataFrameColumn without throwing an exception.
Desktop (please complete the following information):
- OS: [Windows 10]
- Spark Version [3.3.4]
- Microsoft.Spark Version [2.3.0]
- Microsoft.Spark.Worker [Microsoft.Spark.Worker.net8.0.win-x64-2.3.0]
Additional context
This issue only occurs when the return type of the apply method is StringDataFrameColumn. Returning Int32DataFrameColumn and DoubleDataFrameColumn types will work as expected
