Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the bug
Hi,
Using kyuubi 1.10.1
with spark 3.5.2
seems like it has a regression from kyuubi with spark 3.4.4
. I have a view with a row filter and then when querying the view as 2 subqueries of itself I get the error mentioned in the engine log.
I was able to get this minimally reproducable using the source tag v1.10.1
and doing a default build of kyuubi with ranger running in docker.
To reproduce the error you have to create tables from these zipped parquet files:
Here is the SQL to create the tables:
create table if not exists Album
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/tmp/chinook/alb.parquet")
);
create table if not exists Artist
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/tmp/chinook/art.parquet")
);
create table if not exists Track
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/tmp/chinook/trk.parquet")
);
Then you create a view on top of these tables:
CREATE VIEW myview
as
SELECT
`E95676`.`ArtistId` `ArtistId`
, `E95676`.`Name` `ArtistName`
, `E95675`.`AlbumId` `AlbumId`
, `E95675`.`Title` `AlbumTitle`
, `E95685`.`TrackId` `TrackId`
, `E95685`.`Name` `TrackName`
FROM
`Album` `E95675`
LEFT OUTER JOIN
`Artist` `E95676`
ON
`E95675`.`ArtistId` = `E95676`.`ArtistId`
LEFT OUTER JOIN
`Track` `E95685`
ON
`E95685`.`AlbumId` = `E95675`.`AlbumId`
Then a row filter should be added to ranger like so:
The query that causes the error is this:
SELECT T0.C1, T1.F1
FROM (
select a.TrackName C1 from myview a
) T0
LEFT OUTER JOIN (
select b.TrackName F1 from myview b
) T1 ON T0.C1 = T1.F1
Strange thing is that changing the case of a single character in the second subquery then makes the query work:
SELECT T0.C1, T1.F1
FROM (
select a.TrackName C1 from myview a
) T0
LEFT OUTER JOIN (
select b.TrackName F1 from Myview b
) T1 ON T0.C1 = T1.F1
Unfortunately, I do not have control over this.
I tested in our k8s environment against spark 3.4.4
and the issue does not occur. I have not yet tested against a local build for spark 3.4. I will provide those details once the build completes
Affects Version(s)
1.10.1
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
org.apache.spark.sql.AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "TrackName" missing from "ArtistId", "ArtistName", "AlbumId", "AlbumTitle", "TrackId", "TrackName" in operator !Project [TrackName#331 AS F1#319]. Attribute(s) with the same name appear in the operation: "TrackName".
Please check if the right attribute(s) are used.; line 1 pos 88;
Project [C1#318, F1#319]
+- Join LeftOuter, (C1#318 = F1#319)
:- SubqueryAlias T0
: +- Project [TrackName#331 AS C1#318]
: +- SubqueryAlias a
: +- SubqueryAlias spark_catalog.default.myview
: +- Filter (albumid#328L = cast(117 as bigint))
: +- RowFilterMarker
: +- PermanentViewMarker
: +- View (`spark_catalog`.`default`.`myview`, [ArtistId#326L,ArtistName#327,AlbumId#328L,AlbumTitle#329,TrackId#330L,TrackName#331])
: +- Project [cast(ArtistId#320L as bigint) AS ArtistId#326L, cast(ArtistName#321 as string) AS ArtistName#327, cast(AlbumId#322L as bigint) AS AlbumId#328L, cast(AlbumTitle#323 as string) AS AlbumTitle#329, cast(TrackId#324L as bigint) AS TrackId#330L, cast(TrackName#325 as string) AS TrackName#331]
: +- Project [ArtistId#91L AS ArtistId#320L, Name#92 AS ArtistName#321, AlbumId#88L AS AlbumId#322L, Title#89 AS AlbumTitle#323, TrackId#93L AS TrackId#324L, Name#94 AS TrackName#325]
: +- Join LeftOuter, (AlbumId#95L = AlbumId#88L)
: :- Join LeftOuter, (ArtistId#90L = ArtistId#91L)
: : :- SubqueryAlias E95675
: : : +- SubqueryAlias spark_catalog.default.album
: : : +- Relation spark_catalog.default.album[AlbumId#88L,Title#89,ArtistId#90L] parquet
: : +- SubqueryAlias E95676
: : +- SubqueryAlias spark_catalog.default.artist
: : +- Relation spark_catalog.default.artist[ArtistId#91L,Name#92] parquet
: +- SubqueryAlias E95685
: +- SubqueryAlias spark_catalog.default.track
: +- Relation spark_catalog.default.track[TrackId#93L,Name#94,AlbumId#95L,MediaTypeId#96L,GenreId#97L,Composer#98,Milliseconds#99L,Bytes#100L,UnitPrice#101] parquet
+- SubqueryAlias T1
+- !Project [TrackName#331 AS F1#319]
+- SubqueryAlias b
+- SubqueryAlias spark_catalog.default.myview
+- Filter (albumid#348L = cast(117 as bigint))
+- RowFilterMarker
+- PermanentViewMarker
+- Project [cast(ArtistId#326L as bigint) AS ArtistId#346L, cast(ArtistName#327 as string) AS ArtistName#347, cast(AlbumId#328L as bigint) AS AlbumId#348L, cast(AlbumTitle#329 as string) AS AlbumTitle#349, cast(TrackId#330L as bigint) AS TrackId#350L, cast(TrackName#331 as string) AS TrackName#351]
+- View (`spark_catalog`.`default`.`myview`, [ArtistId#326L,ArtistName#327,AlbumId#328L,AlbumTitle#329,TrackId#330L,TrackName#331])
+- Project [cast(ArtistId#320L as bigint) AS ArtistId#326L, cast(ArtistName#321 as string) AS ArtistName#327, cast(AlbumId#322L as bigint) AS AlbumId#328L, cast(AlbumTitle#323 as string) AS AlbumTitle#329, cast(TrackId#324L as bigint) AS TrackId#330L, cast(TrackName#325 as string) AS TrackName#331]
+- Project [ArtistId#335L AS ArtistId#320L, Name#336 AS ArtistName#321, AlbumId#332L AS AlbumId#322L, Title#333 AS AlbumTitle#323, TrackId#337L AS TrackId#324L, Name#338 AS TrackName#325]
+- Join LeftOuter, (AlbumId#339L = AlbumId#332L)
:- Join LeftOuter, (ArtistId#334L = ArtistId#335L)
: :- SubqueryAlias E95675
: : +- SubqueryAlias spark_catalog.default.album
: : +- Relation spark_catalog.default.album[AlbumId#332L,Title#333,ArtistId#334L] parquet
: +- SubqueryAlias E95676
: +- SubqueryAlias spark_catalog.default.artist
: +- Relation spark_catalog.default.artist[ArtistId#335L,Name#336] parquet
+- SubqueryAlias E95685
+- SubqueryAlias spark_catalog.default.track
+- Relation spark_catalog.default.track[TrackId#337L,Name#338,AlbumId#339L,MediaTypeId#340L,GenreId#341L,Composer#342,Milliseconds#343L,Bytes#344L,UnitPrice#345] parquet
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:711)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:215)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:215)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:197)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:202)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:193)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:171)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:202)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:225)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:222)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:90)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:174)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:158)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:85)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:113)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Kyuubi Server Configurations
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
## Kyuubi Configurations
#
# kyuubi.authentication NONE
#
kyuubi.frontend.bind.host 0.0.0.0
# kyuubi.frontend.protocols THRIFT_BINARY,REST
# kyuubi.frontend.thrift.binary.bind.port 10009
# kyuubi.frontend.rest.bind.port 10099
#
# kyuubi.engine.type SPARK_SQL
# kyuubi.engine.share.level USER
# kyuubi.session.engine.initialize.timeout PT3M
#
kyuubi.ha.addresses localhost:2181
# kyuubi.ha.namespace kyuubi
#
# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html
spark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
spark.executor.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/apiguardian-api-1.1.2.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/gethostname4j-1.0.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-annotations-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-core-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-databind-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-base-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jersey-bundle-1.19.4.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-platform-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar
# spark.executor.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar
spark.driver.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/apiguardian-api-1.1.2.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/gethostname4j-1.0.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-annotations-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-core-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-databind-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-base-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jersey-bundle-1.19.4.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-platform-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar
# spark.driver.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar
Kyuubi Engine Configurations
## ranger-spark-security.xml
<configuration>
<property>
<name>ranger.plugin.spark.policy.rest.url</name>
<value>http://localhost:6080</value>
</property>
<property>
<name>ranger.plugin.spark.service.name</name>
<value>spark</value>
</property>
<property>
<name>ranger.plugin.spark.policy.cache.dir</name>
<value>/tmp/policycache</value>
</property>
<property>
<name>ranger.plugin.spark.policy.pollIntervalMs</name>
<value>1000</value>
</property>
<property>
<name>ranger.plugin.spark.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
<property>
<name>ranger.plugin.spark.enable.implicit.userstore.enricher</name>
<value>true</value>
<description>Enable UserStoreEnricher for fetching user and group attributes if using macros or scripts in row-filters since Ranger 2.3</description>
</property>
<property>
<name>ranger.plugin.hive.policy.cache.dir</name>
<value>/tmp/policycache</value>
<description>As Authz plugin reuses hive service def, a policy cache path is required for caching UserStore and Tags for "hive" service def, while "ranger.plugin.spark.policy.cache.dir config" is the path for caching policies in service. </description>
</property>
</configuration>
Additional context
No response
Are you willing to submit PR?
- Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- No. I cannot submit a PR at this time.