Allow configuring per-mount-point per-queue-type disk alarms#14815
Allow configuring per-mount-point per-queue-type disk alarms#14815the-mikedavis wants to merge 17 commits intorabbitmq:mainfrom
Conversation
| false -> | ||
| {noreply, State1} | ||
| end; | ||
| handle_cast({channel_published_to_queue_type, _ChPid, QT}, |
There was a problem hiding this comment.
This feature might need a feature flag. Here for direct connections if old client code is used on a newer server then it would error after publishing since it isn't expecting this cast. I think it would be unlikely to happen in practice but the mixed-version test suite will probably run into this.
|
What would the config setup be for having a main disk that contains quorum and classic queues and a secondary disk that contains streams. Would we specify the same mount point for quorum and classic with each defining queue_types as quorum and classic respectively? Would that result in a common alarm for both or two alarms looking at the same thing. |
|
Ah yeah, in that scenario you could have a config like so: disk_free_limits.streaming.mount_point = /mnt/data/streams
disk_free_limits.streaming.absolute = 2GB
disk_free_limits.streaming.queue_types = stream
disk_free_limits.messaging.mount_point = /mnt/data/queues
disk_free_limits.messaging.absolute = 2GB
disk_free_limits.messaging.queue_types = classic,quorumAnd if |
|
Ah I see thanks! So if I understand correctly the name here Taking that thought further, what would the process of adding that "bob" disk alarm to an existing broker look like. If node A thinks the "bob" alarm exists but node B doesn't can there be issues that comes from the disagreement? When node A restarts with the new configuration do all nodes now know of the "bob" alarm or just node A until all other nodes also restart. Also after I added my new "bob" disk alarm, what ways do I have as a user to then monitor the "bob" alarm to see if it's currently alarming, what value it is configured to and how close it's getting to the alarm point. For MQ's use case currently we are getting this information from the Lastly for the RabbitMQ console we currently have a column named "Disk space" that displays the information of the as of now only disk alarm. When I add this "bob" disk alarm would we want to dynamically add a new column to that table named something like "Disk space (bob)". In that case would we also support defining the ordering of those columns. For example if we consider that it's most relevant to display the disk alarm of quorum queues on the left, then streams, then classic and at the end the disk alarm for non queue storage, would we be able to define that order manually in some console config. |
|
The Yeah before this change is finished we should really expose the configured disks in the API (and maybe prometheus as well?) and the UI. For the ordering part maybe we can use the order they are listed in the config file? There's an ETS table holding the info of the last-known available bytes and configured limit which can be queried cheaply for the metrics. And we should extend the CLI commands so that the limits can be updated dynamically. |
c96fb3a to
44016e6
Compare
cc9a3f4 to
d8b19d3
Compare
d8b19d3 to
cbc49e5
Compare
3d9dbb5 to
f47572d
Compare
48f8492 to
6678cce
Compare
This is the same kind of toggle as the Raft data directory and stream data directory but for controlling classic queue data location.
This is not a functional change, just a refactor to eliminate dicts and use maps instead. This cleans up some helper functions like dict_append/3, and we can use map comprehensions in some places to avoid intermediary lists.
Previously we set `start_disksup` to `false` to avoid OTP's automatic monitoring of disk space. `disksup`'s gen_server starts a port (which runs `df` on Unix) which measures disk usage and sets an alarm through OTP's `alarm_handler` when usage exceeds the configured `disk_almost_full_threshold`. We can set this threshold to 1.0 to effectively turn off disksup's monitoring (i.e. the alarm will never be set). By enabling disksup we have access to `get_disk_data/0` and `get_disk_info/0,1` which can be used to replace the copied versions in `rabbit_disk_monitor`.
`disksup` now exposes the calculation for available disk space for a given path using the same `df` mechanism on Unix. We can use this directly and drop the custom code which reimplements that.
Co-authored-by: Sunny Katkuri <skatkur@amazon.com>
This introduces a new variant of `rabbit_alarm:resource_alarm_source()`:
`{disk, QueueType}` which triggers when the configured mount for queue
type(s) fall under their limit of available space.
This covers both network and direct connections for 0-9-1. We store a set of the queue types which have been published into on both a channel and connection level since blocking is done on the connection level but only the channel knows what queue types have been published. Then when the published queue types or the set of alarms changes, the connection evaluates whether it is affected by the alarm. If not it may publish but once a channel publishes to an alarmed queue type the connection then blocks until the channel exits or the alarm clears.
This adds two gauge metrics which are emitted per configured mount, one for available bytes and the other for the low watermark. The label `"disk=<name>"` is attached to both gauges to distinguish which mount the gauge applies to.
This adds the configured mounts, if there are any, to the API JSON response for the `/api/nodes` and `/api/node/<node>` endpoints and to the overview and node-detail UI pages. Time series data is not collected for these metrics - that should be scraped from the Prometheus endpoint instead.
The polling interval (min, max and fast-rate) should be tuned for use on different hardware. For example high-end machines with strong network bandwidth should be tuning the fast-rate higher so that disk space is checked more often, as with stronger resources the disk space could fill up more rapidly than the default 250MB/sec predicts.
With this change you can say:
rabbitmqctl set_disk_limit mount Streaming 2GiB
This applies the limit only to the "Streaming" mount.
This includes regular MQTT and MQTT-over-WebSockets.
When disk information is unavailable, the 'available' field in mount records is set to 'NaN'. Due to Erlang term ordering, 'NaN' < Limit evaluates to true, which triggers alarms for those mounts. This is correct fail-safe behavior - when we cannot determine available disk space, we block publishing to prevent potential disk exhaustion. Add comments to alarmed_mounts/1 and alarmed_queue_types/1 explaining this intentional behavior.
7926d1c to
f534680
Compare
|
Looks like we'll have to hold off on using |
This is an extension of the free disk space alarm which allows configuring additional mount points to monitor and which queue type(s) to block when they are near full. For example with a config like so:
Publishers to streams would be blocked once the free space of
/data/stream-datafalls under 2GB. Publishers to classic or quorum queues could continue though.The motivation of this feature is that you may want to use separate disks for different queue types. For example for higher throughput you may want to use volume(s) with better throughput and/or IOPS for streaming but use standard disks for queue data. Also, alarms are currently fairly aggressive by blocking all publishing. Ideally you should be able to continue using queues when the space you have allocated for streams fills up, or vice versa.
This is a different approach than #14086. Instead of measuring disk usage under a directory like
du(1),rabbit_disk_monitoris updated to measure free space of all mounts at once withdisksup:get_disk_info/0. Under the hood this performs the samedf(1)check asrabbit_disk_monitorhad been doing previously - measuring mount-point free space is much cheaper than measuring directory disk footprint. Monitoring mount points is also quite flexible: you can use multiple disks on one mount point with RAID-0 striping or split up a single disk with partitions.This is a draft - it needs tests and currently only AMQP 0-9-1 is updated to perform selecting blocking. All other protocols currently block for any alarm.
Some of the commits in this branch are refactors that could be cherry-picked out. #14814 is pretty trivial and the refactors to use maps instead of
dictinrabbit_alarmand usedisksupinstead of the customdfcode inrabbit_disk_alarmare not strictly related to the feature here.Discussed in #14590