Skip to content

fill_unspecified_columns_with_nulls won't work for inner fields. #580

@jromerosk

Description

@jromerosk

The function transform_to_schema has an flag to populate with nulls the resulting DataSet when the columns are not in the data or in the transformations. However this won't work for fields inside structs.

Example:

from pyspark.sql import types as T
import typedspark as TS

class TestStructType(TS.Schema):
    f1: TS.Column[T.StringType]
    f2: TS.Column[T.StringType]

class TestSchema(TS.Schema):
    a: TS.Column[TS.StructType[TestStructType]]

df = spark.createDataFrame([({"f1":"a"},)],"struct<a:struct<f1:string>>")
ds = TS.transform_to_schema(
    df,
    TestSchema,
    fill_unspecified_columns_with_nulls=True
)

Expected behaviour:
The resulting dataset should have the f2 field populated with nulls.

Actual behaviour
Error:
TypeError: Schema TestSchema.a contains the following columns not present in data: {'f2'}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions