-
Notifications
You must be signed in to change notification settings - Fork 14
Remove gossip and mingling from Celery workers #2251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Note In draft mode currently -- just doing some robust testing with |
@waxlamp @mvandenburgh @jjnesbitt @satra @yarikoptic -- Trying to replicate locally with You can see below that without any messages being sent to the I'd still like to be able to stress test how many messages are produced/consumed when some of our more intensive tasks are invoked. ![]() ![]() Curious if locally isn't as useful to find the root cause -- rather, a staging environment deployment might be best -- it is certainly possible that both gossip/mingling and something with our current infra<>tasks being sent are both ballooning message consumption |
Moving to ready for review per results observed by @ sandyhider on EMBER team: aplbrain@ab0188b |
Thanks Aaron. @aaronkanzer @sandyhider Do we know why this issue is arising with the LINC and EMBER deployments but not the DANDI deployment? @mvandenburgh How many messages are being consumed at baseline for the DANDI deployment? |
On Ember our service plan on CloudAMQP is called Elegant Ermine. It allows 20 million messages a month. Last month we hit that limit around day 26. The higher service plans don't have a fixed limit but a max throughput a minute. After having issues trying to change our service plan to a larger size, the Heroku Support team suggested that if we make these changes with mingle and gossip we could stay at the current service plan level. We are just under 3 million messages now and it took us almost a week to make the changes on production this month. It was at roughly 2.3 million at that time. You are not having the problem on DANDI because your CloudAMQP doesn't have a hard limit. The support person also told us that by default mingle and gossip do not actually do anything when they discover and issue. So those messages are just wasted traffic in the current system. |
Thank you, @sandyhider. |
Stemming from behavior in @sandyhider 's work on EMBER Archive (a clone of DANDI Archive), their respective CloudAMQP instance quickly hits capacity
Similar behavior has been observed in LINC (Cc @kabilar )
From Sandy's correspondence with CloudAMQP support staff, a suggestion was received to disable gossip and mingling.
Why can we not see current gossip and mingling:
Current, our
Procfile
has --log-level atINFO
-- gossip and mingling are only expressed at theDEBUG
levelDo we need mingling in DANDI Archive
No --
mingling
is meant to handle celery workers that have tasks defined in differing code bases. Our celery tasks are all prescribed in the same static coddeDo we need gossip in DANDI Archive
No --
gossip
enables Celery workers to broadcast and listen to events (used for things like remote control, celery events, and dashboards like Flower).Our Celery setup is primarily producer/consumer within a single deployment, so there’s no need for this cross-worker coordination.
This PR includes the appropriate flags to disable Celery functionality that does not seem integral, and should prevent bloat of our message broker
Cc @yarikoptic @satra