Skip to content

Jetstream API timeouts on MQTT streams #6191

Open
@slice-arpitkhatri

Description

@slice-arpitkhatri

Observed behavior

We are using MQTT in a single-node NATS deployment. There are sudden spikes in JetStream API failures, which cause connection issues, subscription failures, and message publishing failures. This occurs multiple times per day, making it a high-frequency failure event. A clean restart resolves the issue. During these incidents, there are no anomalies in the CPU or memory metrics.

System Details

Instance Details:

CPU: 32 cores
Memory: 128GB
Disk Storage: 50GB

Utilization:

CPU: 2 cores
Memory: 1GB
Disk: 150MB

Number of MQTT connections: 3,000

Number of MQTT subscriptions: 6,000 (QoS 1)

Messages produced: ~30 RPS across all topics

A single NATS queue group subscription is used to consume MQTT-published messages on one topic.

Associated Logs:

  • mid: 102204 - "cae2bc80-7142-11ec-b9b8-33dad110a235" - Unable to persist session "cae2bc80-7142-11ec-b9b8-33dad110a235" (seq=70876): Timeout after 4.000022403s. Request type "SP" on "$MQTT.sess.RT45Zasv" (reply="$MQTT.JSA.S1Nunr6R.SP.RT45Zasv.1iHZZPsxA2EXBvLS043jtn").

  • mid: 116735 - "KkuRAJeYH02G8HqecxCiAW" - Unable to add JetStream consumer for subscription on "abcd.user.8a7d3311-4040-40b8-955d-834ce54b8c15": Error - Timeout after 4.000826922s. Request type "CC" on "$JS.API.CONSUMER.DURABLE.CREATE.$MQTT_msgs.51r4DC1W_KkuRAJeYH02G8HqecxLU1k" (reply="$MQTT.JSA.S1Nunr6R.CC.1iHZZPsxA2EXBvLS043jic").

  • mid: 84480647 - "mqttjs_a1346563" - Read loop processing time: 5.011585369s.

Another observation is that CPU usage never exceeded 2 cores, despite allocating 32 cores. Could this indicate a potential resource bottleneck?

Expected behavior

No connection/sub/pub failures

Server and client version

Nats Server version 2.10.22

Host environment

Kubernetes v1.25

Steps to reproduce

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regressionstaleThis issue has had no activity in a while

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions