Skip to content

[Bug] Intermittent OperationHandle 404 when fetching results under concurrent load #7363

@BohanZhang0222

Description

@BohanZhang0222

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

I am performing a concurrency stress test on Kyuubi and encountered intermittent 404 errors when fetching operation results.

The issue appears to occur when calling get result row set for an operation that has already transitioned to CLOSED_STATE, even though the client did not explicitly close the operation.

Test Setup
Goal

Validate the maximum concurrent Operation capacity supported by 4 Kyuubi nodes under the same environment configuration.

Load Testing Method

Using Locust to simulate client behavior with the following workflow:

Create operation

Poll operation status

Fetch operation result

Repeat continuously

Test Parameters

Concurrency: 60

Duration: 10 minutes continuous execution

Observed Behavior

Intermittent failures when fetching results:

Error getting result row set for operation handle <operation_id>
Example error:

org.apache.kyuubi.KyuubiSQLException: Invalid OperationHandle [a0f5cbc3-08a4-4a54-8e38-e944174fa940]
This corresponds to a 404-like scenario where the operation handle is no longer valid.

Affects Version(s)

1.9.3

Kyuubi Server Log Output

From Kyuubi server logs, the lifecycle of a failed operation is as follows:
2026-03-20 17:36:33.524 INFO  OperationLog - Creating operation log file ...

2026-03-20 17:42:53.342 INFO  ExecuteStatement - PENDING_STATE -> RUNNING_STATE

2026-03-20 17:42:54.572 INFO  ExecuteStatement - RUNNING_STATE -> FINISHED_STATE

2026-03-20 17:44:42.814 INFO  ExecuteStatement - Processing zhou's query[a0f5cbc3-08a4-4a54-8e38-e944174fa940]: FINISHED_STATE -> CLOSED_STATE, time taken: 109.472 seconds

2026-03-20 17:44:42.814 ERROR ApiUtils - Error getting result row set for operation handle a0f5cbc3-08a4-4a54-8e38-e944174fa940
org.apache.kyuubi.KyuubiSQLException: Invalid OperationHandle [a0f5cbc3-08a4-4a54-8e38-e944174fa940]
        at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
        at org.apache.kyuubi.operation.OperationManager.getOperation(OperationManager.scala:106)
        at org.apache.kyuubi.server.BackendServiceMetric.$anonfun$fetchResults$1(BackendServiceMetric.scala:211)
        at org.apache.kyuubi.metrics.MetricsSystem$.timerTracing(MetricsSystem.scala:112)
        at org.apache.kyuubi.server.BackendServiceMetric.fetchResults(BackendServiceMetric.scala:186)
        at org.apache.kyuubi.server.BackendServiceMetric.fetchResults$(BackendServiceMetric.scala:181)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.fetchResults(KyuubiServer.scala:179)
        at org.apache.kyuubi.server.api.v1.OperationsResource.getNextRowSet(OperationsResource.scala:186)

Additionally, I noticed that before the error occurs, there are usually logs indicating that the ZooKeeper session was closed. I’m not sure whether this is related to the issue.

Also, in the Kyuubi server logs, there are a large number of entries showing frequent connections being established to ZooKeeper.
Image

Kyuubi Engine Log Output

Kyuubi Server Configurations

Kyuubi Engine Configurations

Additional context

Not all operations are affected。

Image Image Image Image Image

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions