@@ -39,8 +39,56 @@ New Functionality
3939 This entire feature can be disabled by loading the new
4040 ``policy/protocols/conn/disable-unknown-ip-proto-support.zeek`` policy script.
4141
42+ - Broker's message I/O buffering now operates on per-peering granularity at the
43+ sender (it was previously global) and provides configurable overflow handling
44+ when a fast sender overwhelms a slow receiver, via the following new constants
45+ in the ``Broker`` module:
46+
47+ const peer_buffer_size = 2048 &redef;
48+ const peer_overflow_policy = "disconnect" &redef;
49+ const web_socket_buffer_size = 512 &redef;
50+ const web_socket_overflow_policy = "disconnect" &redef;
51+
52+ When a send buffer overflows (i.e., it is full when a node tries to transmit
53+ another message), the sender may unpeer the slow receiver (policy
54+ ``disconnect``, the default), drop the newest message in the buffer
55+ (``drop_newest``), or drop the oldest (``drop_oldest``). Buffer sizes are
56+ measured in number of messages, not bytes. Note that "sender" and "receiver"
57+ here are independent of the direction in which Zeek originally established the
58+ peering. After disconnects Zeek automatically tries to re-establish peering
59+ with the slow node, in case it recovers.
60+
61+ Zeek notifies you in two ways of the fact that such disconnects occur:
62+
63+ * A cluster.log entry indicates for the sending node that a slow peered node
64+ has been removed. Here node ``worker01`` has removed a peered ``proxy01`:
65+
66+ 1733468802.626622 worker01 removed due to backpressure overflow: 127.0.0.1:42204/tcp (proxy01)
67+
68+ * A labeled counter metric ``zeek_broker_backpressure_disconnects_total`` in
69+ the telemetry framework tracks the number of times such disconnects have
70+ occurred between respective nodes. For example this indicates the same
71+ disconnect as above:
72+
73+ zeek_broker_backpressure_disconnects_total{endpoint="worker01",peer="proxy01"} 1
74+
75+ To implement custom handling of a backpressure-induced disconnect, add a
76+ ``Broker::peer_removed`` event, as follows:
77+
78+ event Broker::peer_removed(endpoint: Broker::EndpointInfo, msg: string)
79+ {
80+ if ( "caf::sec::backpressure_overflow" !in msg )
81+ return;
82+
83+ # The local node has disconnected the given endpoint,
84+ # add your logic here.
85+ }
86+
87+ These new policies fix a problem in which misbehaving nodes could trigger
88+ cascading "lockups" of nodes, each ceasing to transmit any messages.
89+
4290- Zeek now includes a PostgreSQL protocol analyzer. This analyzer is enabled
43- by default. The analyzer's events and its ``postgresql.log`` should be
91+ by default. The analyzer's events and its ``postgresql.log`` should
4492 considered preliminary and experimental until the arrival of Zeek's next
4593 long-term-stable release (8.0).
4694
0 commit comments