Skip to content

Bug: tables written to Databricks are unusable if any columns contain only NA_character_ #149

@wurli

Description

@wurli

This is a weird one which took a while to get to the root cause of!

Reproducible like this:

sc <- spark_connect(method = "databricks_connect", serverless = TRUE, version = "16.1")

df <- tibble(x = NA_character_)
tbl_name <- "x.y.z"
spark_df <- sparklyr::copy_to(sc, df, tbl_name)
spark_write_table(spark_df, tbl_name, "overwrite")

Strangely, the data shows up in the Databricks GUI with Type 'void':

Image

But when I try and check the Sample Data I get an error, and I'm also unable to access the data through normal pyspark methods:

Image

I don't think this issue manifests with other datatypes, e.g. plain old NA (logical).

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions