Skip to content

[Bug] AuthZ RowFilter causes org.apache.spark.sql.AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] in spark 3.5 #6889

Open
@lanklaas

Description

@lanklaas

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

Hi,

Using kyuubi 1.10.1 with spark 3.5.2 seems like it has a regression from kyuubi with spark 3.4.4. I have a view with a row filter and then when querying the view as 2 subqueries of itself I get the error mentioned in the engine log.

I was able to get this minimally reproducable using the source tag v1.10.1 and doing a default build of kyuubi with ranger running in docker.

To reproduce the error you have to create tables from these zipped parquet files:

test-data.zip

Here is the SQL to create the tables:

create table if not exists Album
  USING org.apache.spark.sql.parquet
  OPTIONS (
  path ("/tmp/chinook/alb.parquet")
  );

create table if not exists Artist
  USING org.apache.spark.sql.parquet
  OPTIONS (
  path ("/tmp/chinook/art.parquet")
  );

create table if not exists Track
  USING org.apache.spark.sql.parquet
  OPTIONS (
  path ("/tmp/chinook/trk.parquet")
  );

Then you create a view on top of these tables:

CREATE VIEW myview
as
SELECT
    `E95676`.`ArtistId` `ArtistId`
,   `E95676`.`Name`     `ArtistName`
,   `E95675`.`AlbumId`  `AlbumId`
,   `E95675`.`Title`    `AlbumTitle`
,   `E95685`.`TrackId`  `TrackId`
,   `E95685`.`Name`     `TrackName`
FROM
    `Album` `E95675`
LEFT OUTER JOIN
    `Artist`    `E95676`
ON
    `E95675`.`ArtistId` =   `E95676`.`ArtistId`
LEFT OUTER JOIN
    `Track` `E95685`
ON
    `E95685`.`AlbumId`  =   `E95675`.`AlbumId`

Then a row filter should be added to ranger like so:

image

The query that causes the error is this:

SELECT T0.C1, T1.F1
FROM (
select a.TrackName C1 from myview a
	) T0
LEFT OUTER JOIN (	
select b.TrackName F1 from myview b
) T1 ON T0.C1 = T1.F1

Strange thing is that changing the case of a single character in the second subquery then makes the query work:

SELECT T0.C1, T1.F1
FROM (
select a.TrackName C1 from myview a
	) T0
LEFT OUTER JOIN (	
select b.TrackName F1 from Myview b
) T1 ON T0.C1 = T1.F1

Unfortunately, I do not have control over this.

I tested in our k8s environment against spark 3.4.4 and the issue does not occur. I have not yet tested against a local build for spark 3.4. I will provide those details once the build completes

Affects Version(s)

1.10.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

org.apache.spark.sql.AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "TrackName" missing from "ArtistId", "ArtistName", "AlbumId", "AlbumTitle", "TrackId", "TrackName" in operator !Project [TrackName#331 AS F1#319]. Attribute(s) with the same name appear in the operation: "TrackName".
Please check if the right attribute(s) are used.; line 1 pos 88;
Project [C1#318, F1#319]
+- Join LeftOuter, (C1#318 = F1#319)
   :- SubqueryAlias T0
   :  +- Project [TrackName#331 AS C1#318]
   :     +- SubqueryAlias a
   :        +- SubqueryAlias spark_catalog.default.myview
   :           +- Filter (albumid#328L = cast(117 as bigint))
   :              +- RowFilterMarker
   :                 +- PermanentViewMarker
   :                       +- View (`spark_catalog`.`default`.`myview`, [ArtistId#326L,ArtistName#327,AlbumId#328L,AlbumTitle#329,TrackId#330L,TrackName#331])
   :                          +- Project [cast(ArtistId#320L as bigint) AS ArtistId#326L, cast(ArtistName#321 as string) AS ArtistName#327, cast(AlbumId#322L as bigint) AS AlbumId#328L, cast(AlbumTitle#323 as string) AS AlbumTitle#329, cast(TrackId#324L as bigint) AS TrackId#330L, cast(TrackName#325 as string) AS TrackName#331]
   :                             +- Project [ArtistId#91L AS ArtistId#320L, Name#92 AS ArtistName#321, AlbumId#88L AS AlbumId#322L, Title#89 AS AlbumTitle#323, TrackId#93L AS TrackId#324L, Name#94 AS TrackName#325]
   :                                +- Join LeftOuter, (AlbumId#95L = AlbumId#88L)
   :                                   :- Join LeftOuter, (ArtistId#90L = ArtistId#91L)
   :                                   :  :- SubqueryAlias E95675
   :                                   :  :  +- SubqueryAlias spark_catalog.default.album
   :                                   :  :     +- Relation spark_catalog.default.album[AlbumId#88L,Title#89,ArtistId#90L] parquet
   :                                   :  +- SubqueryAlias E95676
   :                                   :     +- SubqueryAlias spark_catalog.default.artist
   :                                   :        +- Relation spark_catalog.default.artist[ArtistId#91L,Name#92] parquet
   :                                   +- SubqueryAlias E95685
   :                                      +- SubqueryAlias spark_catalog.default.track
   :                                         +- Relation spark_catalog.default.track[TrackId#93L,Name#94,AlbumId#95L,MediaTypeId#96L,GenreId#97L,Composer#98,Milliseconds#99L,Bytes#100L,UnitPrice#101] parquet
   +- SubqueryAlias T1
      +- !Project [TrackName#331 AS F1#319]
         +- SubqueryAlias b
            +- SubqueryAlias spark_catalog.default.myview
               +- Filter (albumid#348L = cast(117 as bigint))
                  +- RowFilterMarker
                     +- PermanentViewMarker
                           +- Project [cast(ArtistId#326L as bigint) AS ArtistId#346L, cast(ArtistName#327 as string) AS ArtistName#347, cast(AlbumId#328L as bigint) AS AlbumId#348L, cast(AlbumTitle#329 as string) AS AlbumTitle#349, cast(TrackId#330L as bigint) AS TrackId#350L, cast(TrackName#331 as string) AS TrackName#351]
                              +- View (`spark_catalog`.`default`.`myview`, [ArtistId#326L,ArtistName#327,AlbumId#328L,AlbumTitle#329,TrackId#330L,TrackName#331])
                                 +- Project [cast(ArtistId#320L as bigint) AS ArtistId#326L, cast(ArtistName#321 as string) AS ArtistName#327, cast(AlbumId#322L as bigint) AS AlbumId#328L, cast(AlbumTitle#323 as string) AS AlbumTitle#329, cast(TrackId#324L as bigint) AS TrackId#330L, cast(TrackName#325 as string) AS TrackName#331]
                                    +- Project [ArtistId#335L AS ArtistId#320L, Name#336 AS ArtistName#321, AlbumId#332L AS AlbumId#322L, Title#333 AS AlbumTitle#323, TrackId#337L AS TrackId#324L, Name#338 AS TrackName#325]
                                       +- Join LeftOuter, (AlbumId#339L = AlbumId#332L)
                                          :- Join LeftOuter, (ArtistId#334L = ArtistId#335L)
                                          :  :- SubqueryAlias E95675
                                          :  :  +- SubqueryAlias spark_catalog.default.album
                                          :  :     +- Relation spark_catalog.default.album[AlbumId#332L,Title#333,ArtistId#334L] parquet
                                          :  +- SubqueryAlias E95676
                                          :     +- SubqueryAlias spark_catalog.default.artist
                                          :        +- Relation spark_catalog.default.artist[ArtistId#335L,Name#336] parquet
                                          +- SubqueryAlias E95685
                                             +- SubqueryAlias spark_catalog.default.track
                                                +- Relation spark_catalog.default.track[TrackId#337L,Name#338,AlbumId#339L,MediaTypeId#340L,GenreId#341L,Composer#342,Milliseconds#343L,Bytes#344L,UnitPrice#345] parquet

        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:711)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:215)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:215)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:197)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:202)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:193)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:171)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:202)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:225)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:222)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
        at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:90)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:174)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:158)
        at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:85)
        at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:113)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Kyuubi Server Configurations

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

## Kyuubi Configurations

#
# kyuubi.authentication                    NONE
#
kyuubi.frontend.bind.host                0.0.0.0
# kyuubi.frontend.protocols                THRIFT_BINARY,REST
# kyuubi.frontend.thrift.binary.bind.port  10009
# kyuubi.frontend.rest.bind.port           10099
#
# kyuubi.engine.type                       SPARK_SQL
# kyuubi.engine.share.level                USER
# kyuubi.session.engine.initialize.timeout PT3M
#
kyuubi.ha.addresses                      localhost:2181
# kyuubi.ha.namespace                      kyuubi
#

# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html
spark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
spark.executor.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/apiguardian-api-1.1.2.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/gethostname4j-1.0.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-annotations-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-core-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-databind-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-base-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jersey-bundle-1.19.4.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-platform-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar
# spark.executor.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar
spark.driver.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/apiguardian-api-1.1.2.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/gethostname4j-1.0.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-annotations-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-core-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-databind-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-base-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jersey-bundle-1.19.4.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-platform-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar
# spark.driver.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar

Kyuubi Engine Configurations

## ranger-spark-security.xml


<configuration>
  <property>
    <name>ranger.plugin.spark.policy.rest.url</name>
    <value>http://localhost:6080</value>
  </property>
  <property>
    <name>ranger.plugin.spark.service.name</name>
    <value>spark</value>
  </property>
  <property>
    <name>ranger.plugin.spark.policy.cache.dir</name>
    <value>/tmp/policycache</value>
  </property>
  <property>
    <name>ranger.plugin.spark.policy.pollIntervalMs</name>
    <value>1000</value>
  </property>
  <property>
    <name>ranger.plugin.spark.policy.source.impl</name>
    <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
  </property>
  <property>
    <name>ranger.plugin.spark.enable.implicit.userstore.enricher</name>
    <value>true</value>
    <description>Enable UserStoreEnricher for fetching user and group attributes if using macros or scripts in row-filters since Ranger 2.3</description>
  </property>
  <property>
    <name>ranger.plugin.hive.policy.cache.dir</name>
    <value>/tmp/policycache</value>
    <description>As Authz plugin reuses hive service def, a policy cache path is required for caching UserStore and Tags for &quot;hive&quot; service def, while &quot;ranger.plugin.spark.policy.cache.dir config&quot; is the path for caching policies in service. </description>
  </property>
</configuration>

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions