Skip to content

[BUG] spring-cloud-azure-starter-servicebus-jms - connection to Azure Service Bus lost for 10 minutes every 24h due to Expired Token (problem with token refresh?) #39099

Open
@jarekhr

Description

@jarekhr

Describe the bug
Every exactly 24h our JMS client disconnects from Azure Service Bus with ExpiredToken error message. Error messages are then reported in logs for 10 minutes. After 10 minutes app reconnects (apparently somehow library manages to get refreshed access token) and the connectivity works fine for the next 24h. Problem cycle takes 24h and 10minutes, each time connection is active for 24h, then disconnection for 10min, then connection restored, and so on.

Exception or Stack Trace
WARN messages reported by qpid

org.apache.qpid.jms.provider.amqp.builders.AmqpResourceBuilder WARN Open of resource:(JmsConsumerInfo: { ID:2a15d2b5-1aa4-4d4e-957e-e557e58c4d5a:1:1:79, destination = dps.prd.pub.v1.some_name }) failed: ExpiredToken: The token is expired. TrackingId:fe7d14cf-9e04-4b19-aae1-716910d66950_G0S2, SystemTracker:NoSystemTracker, Timestamp:2024-03-06T12:53:40 TrackingId:d3514c71af494f87b2e567be34538bd3_G0S2, SystemTracker:gateway5, Timestamp:2024-03-06T12:53:40 [condition = com.microsoft:auth-failed]

After 10 minutes we observed also the stack trace, logged under WARN from thread QpidJMS Connection Executor: ID:2a15d2b5-1aa4-4d4e-957e-e557e58c4d5a:1:

org.apache.qpid.jms.provider.exceptions.ProviderConnectionRemotelyClosedException: The connection was closed by container 'd3514c71af494f87b2e567be34538bd3_G0S2' because it did not have any active links in the past 300000 milliseconds. TrackingId:d3514c71af494f87b2e567be34538bd3_G0S2, SystemTracker:gateway5, Timestamp:2024-03-06T12:53:43 [condition = amqp:connection:forced]
at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToConnectionClosedException(AmqpSupport.java:128)
at org.apache.qpid.jms.provider.amqp.AmqpConnection.processRemoteClose(AmqpConnection.java:148)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:968)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.onData(AmqpProvider.java:878)
at org.apache.qpid.jms.transports.netty.NettyTcpTransport$NettyTcpTransportHandler.channelRead0(NettyTcpTransport.java:548)
at org.apache.qpid.jms.transports.netty.NettyTcpTransport$NettyTcpTransportHandler.channelRead0(NettyTcpTransport.java:541)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

To Reproduce
Steps to reproduce the behavior:

  1. listen on Azure Service Bus topic (with named subscription) using spring-cloud-azure-starter-servicebus-jms for over 24h. App is configured exactly as explained in documentation
  2. observe connection is lost exactly after 24h of listening. Seems that access token is expired
  3. observe same WARN errors logged by spring-cloud-azure-starter-servicebus-jms library, complaining about expired token
  4. observe connection is restored after 10 mins of printing errors (prev. point)
  5. Same will happen after next 24h (token gets expired again)

Code Snippet
Nothing, connection related - listening on topic as per documentation.

Expected behavior
I expect connection to Azure Service Bus to be operational at all times (no outages for 10 minutes every 24h). Access tokens used underneath to implement the passwordless connection should be refreshed automatically and ahead of token expiration - to ensure there's no disruption on the client app side.

Screenshots
If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

  • OS: Azure Spring Apps / Linux
  • IDE: n/a
  • Library/Libraries: com.azure.spring:spring-cloud-azure-starter-servicebus-jms, we use version 5.8.0
  • Java version: Java 17 from Azure Spring Apps
  • App Server/Environment: Apache Tomcat from Spring Boot 3.2
  • Frameworks: Spring Boot 3.2.2

Additional context
We're using spring-cloud-azure-starter-servicebus-jms to listen on Azure Service Bus topic, using JMS APIs. Our app is a Spring Boot server app. We use "passwordless" connections (AAD access tokens used underneath), app is configured exactly as explained in documentation. Aps is running on Azure Spring Apps - we don't have control over the system clock (so clocks misalignment can be ruled out).
This seems to be a problem with AAD token refresh - it does not happen on time or is not applied correctly?

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.azure-springAll azure-spring related issuesazure-spring-jmsIssues related to Spring JMS.bugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions