Skip to content

[Bug] Kyuubi SparkSQLEngine hang in stopping EngineServiceDiscovery #7309

@ruanwenjun

Description

@ruanwenjun

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

We use kyuubi to submit spark sql on k8s.
After the sql finished, the SparkSqlEngine pod doesn't deleted.
I find the below log

26/01/07 06:40:14 INFO SparkSQLSessionManager: Session stopped due to shared level is Connection.
26/01/07 06:40:14 INFO SparkSQLEngine: Service: [SparkTBinaryFrontend] is stopping.
26/01/07 06:40:14 INFO SparkTBinaryFrontendService: Service: [EngineServiceDiscovery] is stopping.

Seems the EngineServiceDiscovery doesn't stopped.

Below is the thread dump, seems the curator is retry

"SparkSQLSessionManager-timeout-checker: Thread-209" #209 daemon prio=5 os_prio=0 cpu=87.24ms elapsed=1089556.23s tid=0x00007f8a4a824000 nid=0x154 in Object.wait() [0x00007f8661926000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.kyuubi.shaded.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1407)
        - locked <0x00000007637f5248> (a org.apache.kyuubi.shaded.zookeeper.ClientCnxn$Packet)
        at org.apache.kyuubi.shaded.zookeeper.ZooKeeper.delete(ZooKeeper.java:880)
        at org.apache.kyuubi.shaded.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:250)
        at org.apache.kyuubi.shaded.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:244)
        at org.apache.kyuubi.shaded.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
        at org.apache.kyuubi.shaded.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241)
        at org.apache.kyuubi.shaded.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225)
        at org.apache.kyuubi.shaded.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
        at org.apache.kyuubi.shaded.curator.framework.recipes.nodes.PersistentNode.deleteNode(PersistentNode.java:347)
        at org.apache.kyuubi.shaded.curator.framework.recipes.nodes.PersistentNode.close(PersistentNode.java:291)
        at org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient.deregisterService(ZookeeperDiscoveryClient.scala:247)
        at org.apache.kyuubi.ha.client.EngineServiceDiscovery.stop(EngineServiceDiscovery.scala:35)

thread.txt

Affects Version(s)

1.9.1

Kyuubi Server Log Output

Kyuubi Engine Log Output

26/01/07 06:40:14 INFO SparkSQLSessionManager: Session stopped due to shared level is Connection.
26/01/07 06:40:14 INFO SparkSQLEngine: Service: [SparkTBinaryFrontend] is stopping.
26/01/07 06:40:14 INFO SparkTBinaryFrontendService: Service: [EngineServiceDiscovery] is stopping.

Kyuubi Server Configurations

Kyuubi Engine Configurations

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions