-
Notifications
You must be signed in to change notification settings - Fork 2.9k
nsqd: add topology region/zone aware message consumption #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this is ready for a first round of review @mreiferson @ploxiln (along w/ it's pair nsqio/go-nsq#312 Some items to decide on -
Once this is squared away and we are happy with it i'll follow up with documentation PR's and exposing region/zone in nsqadmin. |
ploxiln
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some ideas, for what I think are minor simplifications that leave the overall design the same. Overall looks good to me.
apps/nsqd/options.go
Outdated
| flagSet.Duration("http-client-connect-timeout", opts.HTTPClientConnectTimeout, "timeout for HTTP connect") | ||
| flagSet.Duration("http-client-request-timeout", opts.HTTPClientRequestTimeout, "timeout for HTTP request") | ||
| flagSet.String("topology-region", opts.TopologyRegion, "A region represents a larger domain, made up of one or more zones") | ||
| flagSet.String("topology-zone", opts.TopologyZone, "A zone represents a logical failure domain") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably mention something about "for preferring closer consumer"
f390edc to
96adf24
Compare
91517a4 to
fa8d2b4
Compare
2b90b29 to
42d28f5
Compare
88161cb to
af1edde
Compare
af1edde to
4631064
Compare
e93d2b5 to
b84cfe2
Compare
8f51ed1 to
00fcfa9
Compare
|
@jehiah @ploxiln @mreiferson This PR is now officially ready for review - in conjunction with nsqio/go-nsq#312 and nsqio/nsqio.github.io#89. We have written up an experience report based on our observations running these changes for 2 months in our production environment - it can be found here. It also lays out the changes to nsqadmin. Please let me know if you have any other questions! Thanks in advance! |
|
@jehiah hello! what's the plan here for actually landing this? Are y'all waiting on me to review everything 😏? |
|
@mreiferson sorry for the radio silence on this - I think we are ready to give more attention to this in Q4 so if you have any comments on this PR (or the related go-nsq changes) in the next few weeks please share them otherwise the topology changes are ready to land and we'll do that next month. I think the general plan is to still land this behind the feature flag and get them in a 1.4 release and consider removing the FF in 1.5. We have been happy with this running at Bitly for a while now, but i'm keeping an eye on kubernetes/enhancements#4747 as it will make adopting in a K8s environment easier. |
051b99f to
641bae3
Compare
641bae3 to
1cd6297
Compare
|
I tried this in our deployment and it enabling the experiment didn't have the intended effect. I didn't see any effect on cross-zone traffic after enabling it, messages were sent to all zones. I changed the However, I observed that the message routing probably interfered with how concurrency is managed, as our consumer pods scaled down to almost zero after a few minutes. Since the nsq client divides the concurrency (RDY) between the nsqd instances, it seems to me that 2/3 of the in-flight messages that would be cross-zone traffic otherwise are not used for sending messages. So perhaps the go-nsq client should divide the RDY requests only by the number of nsqd instances in the current zone? Resp. divide MaxInFlight for each zone separately? What value of |
This updates nsqd to have a
--topology-regionand--topology-zoneconfig argument and prefer sending messages to same-zone and same-region consumers as proposed in #1300