Skip to content

Unexpectedly extremely many logs during eventbus restart #3497

Open
@ognjen-it

Description

@ognjen-it

Describe the bug
After restarting JetStream (EventBus), Argo Events (Sensor) starts producing an excessive number of errors—over 200,000 errors in a few minutes. This significantly impacts system performance and stability. The issue is consistently reproducible, and logs indicate a flood of reconnection or message processing errors.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy JetStream EventBus
  2. Deploy Sensor that will be connected to that eventbus
  3. Update event bus with any information like tolerations or affinity (then evetnbus will start rollout one by one pod)
  4. Check sensor logs or check log volume on grafana with loki :)

Expected behavior
Argo Events should gracefully handle JetStream restarts without producing an overwhelming number of errors. It should retry connections in a controlled manner rather than flooding logs and potentially overloading the system.

Screenshots
Log volume of one sensor:
Image

Example of log:

{"level":"error","ts":1740569807.8508778,"logger":"argo-events.sensor","caller":"sensor/trigger_conn.go:202","msg":"failed to fetch messages for subscription &{mu:{state:0 sema:0} sid:2 Subject:_INBOX.vg8Q0Diy7UIus5nuVGjAcr.* Queue: jsi:0xc0015a2000 delivered:183 max:0 conn:0xc001501508 mcb:<nil> mch:<nil> errCh:<nil> closed:true sc:false connClosed:true draining:false status:0 statListeners:map[] permissionsErr:<nil> typ:1 pHead:<nil> pTail:<nil> pCond:<nil> pDone:<nil> pMsgs:0 pBytes:0 pMsgsMax:2 pBytesMax:0 pMsgsLimit:65536 pBytesLimit:67108864 dropped:0}, nats: invalid subscription\nnats: subscription closed, previousErr=nats: invalid subscription\nnats: subscription closed, previousErrTime=2025-02-26 11:36:47.850854739 +0000 UTC m=+435273.783129224","sensorName":"my-sensor-namegsm","triggerName":"my-sensor-namegsm","sensorName":"my-sensor-namegsm","stacktrace":"github.com/argoproj/argo-events/pkg/eventbus/jetstream/sensor.(*JetstreamTriggerConn).pullSubscribe\n\t/home/runner/work/argo-events/argo-events/pkg/eventbus/jetstream/sensor/trigger_conn.go:202"}

Environment (please complete the following information):

  • Kubernetes: v1.27.7
  • Argo: 3.6.4
  • Argo Events: 1.9.5
  • JetStream version: 2.10.10

Additional context
Honestly, I wouldn't even have noticed if I didn't have a few dozen sensors and a Loki that had to collect several million logs in those few minutes.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions