This repository has been archived by the owner on Jan 6, 2023. It is now read-only.
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.
Aeron Reliability #890
Open
Description
We're having trouble with aeron exceptions in the onyx client. They are most often Client Conductor Timeouts though occasionally we see other aeron related exceptions. These exceptions kill the job between 1-4x per day (it's a long running job).
We can't seem to make these exceptions go away. GC does not appear to be an issue, nor do we see CPU usage spikes (our systems are running in GKE). Increasing CPU limits doesn't appear to help. The threads just seem to not be woken up in time to conduct their checks.
I'm pretty much stuck at this point trying various fixes while planning backup plans not involving Onyx. Any help to point me in the right direction would be appreciated.
Metadata
Assignees
Labels
No labels
Activity