Skip to content

saveAsTable function doesn't create a table after updating to the spark 3.1.1 #349

Open
@kimtox

Description

Hello,
I have a problem after updating the Spark version from 2.4.3 to 3.1.1. Previously I use the following code to save parquet files and create correspondence table and everything worked fine in tests

df.write
    .mode(SaveMode.ErrorIfExists)
    .format("parquet").option("path", location)
    .saveAsTable(tableFqn)

But after I had moved to Spark version 3.1.1 the last line stopped to create the corresponding table (in tests, at least). Command spark.table(tableFqn) returns an empty df.

Also, I got new warnings and I supposed that this is the root cause of the problem:

2022-02-24 18:37:29.736 [WARN] SparkContext - Using an existing SparkContext; some configuration may not take effect. <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.retries.wait does not exist <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected] <ScalaTest-run>
2022-02-24 18:37:36.725 [WARN] ObjectStore - Failed to get database default, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.158 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.174 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.213 [WARN] ObjectStore - Failed to get database global_temp, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.217 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>

Full stacktrace:

2022-02-24 18:37:29.736 [WARN] SparkContext - Using an existing SparkContext; some configuration may not take effect. <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist <ScalaTest-run>
2022-02-24 18:37:33.148 [WARN] HiveConf - HiveConf of name hive.stats.retries.wait does not exist <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 <ScalaTest-run>
2022-02-24 18:37:36.709 [WARN] ObjectStore - setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected] <ScalaTest-run>
2022-02-24 18:37:36.725 [WARN] ObjectStore - Failed to get database default, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.158 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.174 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.213 [WARN] ObjectStore - Failed to get database global_temp, returning NoSuchObjectException <ScalaTest-run>
2022-02-24 18:37:37.217 [WARN] ObjectStore - Failed to get database enriched_db, returning NoSuchObjectException <ScalaTest-run>

2022-02-24 18:37:38.694 [WARN] package - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:43.584 [WARN] ProcfsMetricsGetter - Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped <driver-heartbeater>
2022-02-24 18:37:44.381 [WARN] SessionState - METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:44.448 [WARN] HiveConf - HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:44.448 [WARN] HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist <ScalaTest-run-running-KimtoxTest>
2022-02-24 18:37:44.448 [WARN] HiveConf - HiveConf of name hive.stats.retries.wait does not exist <ScalaTest-run-running-KimtoxTest>
-chgrp: '<MY_COMPANY_NAME>\<MY_USERNAME>' does not match expected pattern for group
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
command [genericOptions] [commandOptions]

Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...

2022-02-24 18:37:46.942 [WARN] ApacheUtils - NoSuchMethodException was thrown when disabling normalizeUri. This indicates you are using an old version (< 4.5.8) of Apache http client. It is recommended to use http client version >= 4.5.9 to avoid the breaking change introduced in apache client 4.5.7 and the latency in exception handling. See https://github.com/aws/aws-sdk-java/issues/1919 for more information <ScalaTest-run-running-FactActualsToEnrichedIntegrationTest>

.....

[]: Expected 5 values but got 0
java.lang.AssertionError: []: Expected 5 values but got 0

Does anybody have any ideas about this behavior? Versions:
Scala - 2.12.15
Spark - 3.1.1
spark-testing-base_2.12 - 3.1.1_1.1.1

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions