enable dbus-broker reexecute by lclgo · Pull Request #310 · bus1/dbus-broker

lclgo · 2023-01-05T03:58:37Z

This add a new feature to dbus-broker. With this patch, we can reexecute dbus-broker by running dbus-broker-launch --reexec.

The main idea behind this commit is:

Serialize: when broker gets the Reexecute message, it will serialize the main information (socket fd, id, uid, pid, match rule, request name, sasl) to the anonymous file,
Reexecute: launcher and broker reexecute.
AddListener: when launcher has reexecuted successfully, it will send AddListener to broker.
Recover: broker recovers all peers when processing AddListener message, and it uses the socket fd saved in anonymous file to create new peers, and recover id/uid/pid/...
Ready: launcher will broadcast Ready signal, when the dbus-broker-launch --reexec gets this signal, the reexecuting process is over.

I'll split this commit to several small commits after your first review. Please take a look, thank a lot.

src/broker/broker.c

+        if (mem_fd < 0)
+                return error_fold(mem_fd);
+
+        (void) serialize_basic(f, "max_ids", "%d", broker->bus.peers.ids);


src/bus/peer.c

+
+        /* Recover: sasl */
+        peer_recover_sasl(peer, sasl);
+        if (r < 0)


teg · 2023-01-05T10:07:56Z

Thank you for working on this! It is our most requested feature and will surely be greatly appreciated by many.

Some questions before I look at the code in detail (I am sure I'll think of more things in the coming days):

How do you version the serialized data, and what kinds of upgrade paths do you think we should support? My suggestion: save a major.minor version for the serialized state and increment the minor number when we make a change that can not be rolled back from and change the major number when we make changes that we cant roll forward to (nor back from). Then fail to reexec if we find an incompatible version number.
How do you think we should deal with in-flight messages? If I understand correctly they are now currently dropped? I think we can't be lossy, so need to come up with something. Ideally we would serialize everything, but we could also possibly consider some sort of drain operation.

teg · 2023-01-05T10:50:25Z

Regarding the draining, it might be worth while to introduce a Freeze and Thaw call and only allow reexec when the broker is frozen. I think this might simplify the proposed reexec logic as an added bonus (as we would still allow calls on the launcher API while frozen).

Freeze would finish any partial dispatch and stop further dispatching. We still need to serialize messages that are queued but not dispatched though. An open question is what to do about misbehaving peers who have stopped reading/worrying from/to their sockets.

lclgo · 2023-01-06T01:37:08Z

How do you version the serialized data, and what kinds of upgrade paths do you think we should support? My suggestion: save a major.minor version for the serialized state and increment the minor number when we make a change that can not be rolled back from and change the major number when we make changes that we cant roll forward to (nor back from). Then fail to reexec if we find an incompatible version number.

I didn't think about the version. Currently, I only serialize the peer's fd/id/uid/pid/requested name/match rule/sasl and broker's max_id to know which id we should distribute to the newly connected peer after reexecuting. The serialized string is like this:

max_ids=1234
peer_str=12;5600;91750;996;org.freedesktop.PolicyKit1;[{type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',path='/org/freedesktop/DBus'}];[{1}{1}{0}]

12: the socket fd of this peer
5600: the PID of this peer
91750: the peer ID :1.91750
996: the UID of this peer
org.freedesktop.PolicyKit1: the requested name
[{type='signal'...]: the match rule
[{1}{1}{0}]: the sasl state

These should be enough to recover a peer. But yes, we may have some big changes in the future version, major.minor version sounds a good idea. After rpm -Uvh dbus-broker.aarch64, dbus-broker-launch --reexec can send its version to the running old broker, then broker compares the received version with its own, if the two versions are incompatible, the broker refuses to reexecute and replies an error.

How do you think we should deal with in-flight messages? If I understand correctly they are now currently dropped? I think we can't be lossy, so need to come up with something. Ideally we would serialize everything, but we could also possibly consider some sort of drain operation.

I don't close the peer's fd during reexecuting, and write all messages out before reexecuting. After reexecuting, I reepoll these fd and process it. I tested this by the following testcase:

Create a server which can accept two 'x' numbers, and reply the multiply result as 'x';
Create a server which sends method_call to the server in a busy loop, and check the reply;
Reexecuting dbus-broker in a busy loop background for 1000 times;

No message was lost.

Currently, this commit doesn't support create a new connection during reexecuting.

bluca · 2023-01-15T13:30:22Z

This is really cool - one suggestion, maybe use systemd's FD store? That should allow to avoid needing to manually handle things, and a restart should "just" work?

lclgo · 2023-01-16T02:07:36Z

This is really cool - one suggestion, maybe use systemd's FD store? That should allow to avoid needing to manually handle things, and a restart should "just" work?

Thanks for your suggestion, I'll look into it.

dvdhrm

As @teg already mentioned, we appreciate all efforts to implement re-execution of dbus-broker. I don't mind the approach taken by this PR, but I thinks it lacks crucial required elements.

In particular, I would suggest implementing re-execution of the launcher in a separate PR. This can be implemented without requiring dbus-broker to re-execute and can be merged separately.

For dbus-broker, I think this requires serializing the socket/connection object. This is a rather delicate matter but we should be able to verify the behavior in the unit-tests. I don't think your approach works, since it relies on flushing all queues and thus clients can block a re-execution by intentionally filling their receive-queue.

Anyway, thanks a lot for the PR! I think we can definitely take this as a base to move forward!

dvdhrm · 2023-02-07T13:13:59Z

src/broker/broker.c


+        Peer *peeri;
+        c_rbtree_for_each_entry(peeri, &broker->bus.peers.peer_tree, registry_node) {
+                socket_dispatch_write(&peeri->connection.socket);


This does not reliably flush the entire queue. There are several scenarios where this will not block and instead retain data in the user-space queue. In those cases, we would lose messages.

Yes, I toke it for granted that all messages we have gotten will be sent out by socket_dispatch_write successfully. But if there are not many messages flying between dbus and peers, I think it's probobly fine.

By the way, do you think if we should support new client to establish connection during reexecuting? I haved checked systemd, systemctl start/stop doesn't work when systemctl daemon-reexec is running. Supporting that seems very complicated IMHO.

enable dbus-broker reexecute

bb0ece9

github-advanced-security bot found potential problems Jan 5, 2023

View reviewed changes

dvdhrm requested changes Feb 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable dbus-broker reexecute#310

enable dbus-broker reexecute#310
lclgo wants to merge 1 commit intobus1:mainfrom
lclgo:main

lclgo commented Jan 5, 2023

Uh oh!

Check failure

Check warning

teg commented Jan 5, 2023

Uh oh!

teg commented Jan 5, 2023

Uh oh!

lclgo commented Jan 6, 2023

Uh oh!

bluca commented Jan 15, 2023

Uh oh!

lclgo commented Jan 16, 2023

Uh oh!

dvdhrm left a comment

Uh oh!

dvdhrm Feb 7, 2023

Uh oh!

lclgo Feb 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lclgo commented Jan 5, 2023

Uh oh!

Check failure

Check warning

teg commented Jan 5, 2023

Uh oh!

teg commented Jan 5, 2023

Uh oh!

lclgo commented Jan 6, 2023

Uh oh!

bluca commented Jan 15, 2023

Uh oh!

lclgo commented Jan 16, 2023

Uh oh!

dvdhrm left a comment

Choose a reason for hiding this comment

Uh oh!

dvdhrm Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

lclgo Feb 9, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants