-
Notifications
You must be signed in to change notification settings - Fork 138
privsep: enforce message boundaries with MSG_EOR on our messages #533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The nature of the SOCK_SEQPACKET, that privsep modules uses, is stream. See: https://pubs.opengroup.org/onlinepubs/9799919799/functions/V2_chap02.html#tag_16_10_06 To guarantee that a reader will never read two messages in one read operation, the writer shall put end of record markers. The problem exposed itself in FreeBSD 15.0 that started to follow the specification better than before. Other SOCK_SEQPACKET usage considerations: a) as long as our reader provides a receive buffer that would fit the largest message our writer would ever send, we are good with regards to not a reading a partial message b) as long as our writer always write full messages with one write, we don't need use of MSG_WAITALL in reader. Fixes NetworkConfiguration#530
WalkthroughReplaced direct read/write/writev I/O with socket-oriented send/sendmsg/recvmsg using local struct msghdr and m.msg_iovlen for iovec tracking; added MSG_EOR for sends and MSG_WAITALL for receives; adjusted loops, bounds checks, truncation handling, and introduced log fd accessors. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Child as Privsep child (src/privsep.c)
participant Root as Privsep root (src/privsep-root.c)
participant Sock as AF_UNIX socket
participant Peer as Receiver
rect rgba(230,245,255,0.6)
Note over Child,Root: build local msghdr with msg_iov (header,[payload...]) and msg_iovlen
Child->>Sock: sendmsg(fd, &m, MSG_EOR)
Root->>Sock: sendmsg(fd, &m, MSG_EOR)
Sock-->>Peer: deliver framed message
Peer-->>Sock: recvmsg(fd, &m, MSG_WAITALL)
end
alt error
Sock-->>Child: errno / -1
Sock-->>Root: errno / -1
Note over Child,Root: existing retry/error handling runs
end
sequenceDiagram
autonumber
participant Logger as log writer (src/logerr.c)
participant Sock as AF_UNIX socket
participant Reader as log reader (src/logerr.c)
rect rgba(245,250,230,0.6)
Note over Logger: build iov: [pri][pid][message] into msghdr
Logger->>Sock: sendmsg(fd, &m, MSG_EOR)
Sock-->>Reader: framed log message arrives
Reader->>Sock: recvmsg(fd, &m, MSG_WAITALL)
Note over Reader: extract pri, set ctx->log_pid, NUL-terminate payload, consume message
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/privsep.c (2)
924-933: Fix off-by-one iovec bounds check to prevent OOB memory writeThe capacity check allows one extra iteration and can write past iov[6] when msg->msg_iovlen is too large.
Apply this diff:
- for (i = 0; i < (int)msg->msg_iovlen; i++) { - if ((size_t)(iovlen + i) > __arraycount(iov)) { + for (i = 0; i < (int)msg->msg_iovlen; i++) { + if ((size_t)(iovlen + i) >= __arraycount(iov)) { errno = ENOBUFS; return -1; } iovp++; iovp->iov_base = msg->msg_iov[i].iov_base; iovp->iov_len = msg->msg_iov[i].iov_len; }
888-898: Avoid relying on assert for cmsg padding; prevent potential out-of-bounds readIf NDEBUG is set, the assert is compiled out and a large cmsg_padlen could cause the kernel to read past padding. Use a wider padding buffer and a runtime check.
Apply this diff:
- long padding[1] = { 0 }; + uint8_t padding[sizeof(struct cmsghdr)] = { 0 };And replace the assert below (Lines 913-918) as follows:
- assert(cmsg_padlen <= sizeof(padding)); + if (cmsg_padlen > sizeof(padding)) { + errno = ENOBUFS; + return -1; + }
🧹 Nitpick comments (1)
src/privsep.c (1)
937-939: Optionally suppress SIGPIPE on broken peerConsider adding MSG_NOSIGNAL (where available) to avoid SIGPIPE on closed sockets.
Apply this diff:
- len = sendmsg(fd, - &(struct msghdr){ .msg_iov = iov, .msg_iovlen = iovlen }, MSG_EOR); + { + int sflags = MSG_EOR; +#ifdef MSG_NOSIGNAL + sflags |= MSG_NOSIGNAL; +#endif + len = sendmsg(fd, + &(struct msghdr){ .msg_iov = iov, .msg_iovlen = iovlen }, sflags); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/privsep.c(2 hunks)
🔇 Additional comments (1)
src/privsep.c (1)
1068-1070: Good: forwarding path now preserves message boundariesUsing sendmsg + MSG_EOR here aligns the recv/send bridge with the protocol guarantees.
|
Process can't start: |
|
Looks like we need to get on the same page, so that I also reproduce what you have. What exact version of FreeBSD are you using and what version of dhcpcd? To be fair, I did not test on dhcpcd/master, I used the version that is in the FreeBSD ports. |
|
I just retried dhcpcd master (with my patch) on FreeBSD CURRENT, I can't reproduce the problem you got. |
|
Can you please share ktrace dump of your failed start with me at [email protected]? |
|
@glebius thanks for looking into this FreeBSD-specific issue. If you can spot any other pending bug specifically affecting FreeBSD, you're welcome to chip in. Btw, are you the one who subscribed CodeRabbit to this bug? |
|
@perkelix Precisely speaking, this is not really a FreeBSD-specific one. It is a historical bug that FreeBSD SOCK_SEQPACKET would create implicit message boundaries on writes. I heard Linux also has this bug and there are applications that rely on it. This behavior truly contradicts the definition of the SOCK_SEQPACKET and makes it almost the same as SOCK_DGRAM. To make things worse, the Single UNIX Specification while very clearly describes SOCK_SEQPACKET in the main page on sockets, has documentation bugs in recv() functions. This will be corrected in the next version of SUS. Here is a test fuzzer of SOCK_SEQPACKET that helps to understand the right expectations from this type of socket: P.S. No, I did not subscribe CodeRabbit, I guess this is the dhcpcd project automation, |
|
Disregard breakage, that was my dirty hands. Patch works. |
I just read all the man pages around this and I have to ask are you sure? Because there is no clear indication anywhere that you must use MSG_EOR.
So if we can use read, then we can use write which implies implicit message boundaries. If this is a problem, then we could more easily change from SOCK_SEQPACKET to SOCK_STREAM and not bother. |
|
I'm working on the socket(2) manual page right now. It has quite a lot of outdated information and documentation of SOCK_SEQPACKET is just the worst. There is no must in use of MSG_EOR in general. But if you want message boundaries, you need it. You could change from SOCK_SEQPACKET to SOCK_DGRAM, cause your use is much closer to datagram, than to stream. Your expectation is that read size == write size, which is datagram thing, not stream. But you'd better stay with SOCK_SEQPACKET, it is superior. Just use it correctly. |
Apply same fix to privsep-root error return messaging.
|
OK, so can you explain why I need this? Is it because somehow we are tripping on a boundary we shouldn't? EDIT: Sorry, better question - why the existing code is breaking? And if this breaks and is a common pattern is it expected for other up-streams to make similar changes? |
|
I'm not sure your refactoring is correct. With my original change I browsed around the code and made sure that the sendmsg(MSG_EOR) are done only on the file descriptors that are SOCK_SEQPACKET. Putting this flag on other kinds of descriptors doesn't make sense. FreeBSD most likely will just ignore it, so it is not a critical mistake. Not sure about other OSes. I'd better use sendmsg(MSG_EOR) only on those code paths that work with SOCK_SEQPACKET only. |
The opposite - were not observing boundary that should have been observed.
One process was quickly sending two separate messages, the other process read them with one read syscall, effectively behaving as if the second one was lost. But it was not, it was read together with the first message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/dhcpcd.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/dhcpcd.c (1)
src/if.c (1)
xsocketpair(994-1043)
All privsep processes IPC are SOCK_SEQPACKET, my extra change is on the return path from a child process (BPF, ARP, etc) back to the root process. PS_ROOT_FD: https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/privsep.h#L124
OK, I understand now, thanks. I would have thought that MSG_EOR would be implicit for all forms of write where the flag cannot be set and explicit where it can be set. So, writev would be implicit MSG_EOR and sendmsg would explicit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/dhcpcd.c(4 hunks)src/logerr.c(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/dhcpcd.c
|
Yep, looks like the logging one also needs MSG_EOR, otherwise logged messages may glue together. |
src/logerr.c
Outdated
| .msg_iovlen = sizeof(iov) / sizeof(iov[0]) | ||
| }; | ||
|
|
||
| len = (int)recvmsg(fd, &msg, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add MSG_WAITALL flag here. That will guarantee that read operation will wait either provided buffer is filled or a message boundary is observed. That will guarantee against truncated messages.
P.S. With the existing code I doubt truncation can happen, as you always write log messages in a single write operation.
P.P.S. I don't know how if on Linux MSG_WAITALL will stop reads on message boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'll be testing on other OS's before this gets merged as it touches critical paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'll be testing on other OS's before this gets merged as it touches critical paths.
I have just tested on Linux, MSG_WAITALL on SOCK_SEQPACKET works as expected! This was the test:
char buf[100];
int s[2];
assert(socketpair(AF_UNIX, SOCK_SEQPACKET, 0, s) == 0);
assert(send(s[0], buf, 10, MSG_EOR) == 10);
assert(send(s[0], buf, 10, MSG_EOR) == 10);
assert(recv(s[1], buf, sizeof(buf), MSG_WAITALL) == 10);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/dhcpcd.c (1)
2715-2724: Consider simplifying to usesend()instead ofsendmsg()for consistency.While the implementation is correct, using
sendmsg()withiovecfor sending a single integer adds unnecessary complexity. A simplesend()would be more consistent with the other changes in this file.Apply this diff to simplify:
- struct iovec iov[] = { - { .iov_base = &i, .iov_len = sizeof(i) } - }; - struct msghdr msg = { - .msg_iov = iov, - .msg_iovlen = __arraycount(iov), - }; - if (sendmsg(ctx.fork_fd, &msg, MSG_EOR) == -1) - logerr("%s: sendmsg", __func__); + if (send(ctx.fork_fd, &i, sizeof(i), MSG_EOR) == -1) + logerr("%s: send", __func__);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/dhcpcd.c(3 hunks)
🔇 Additional comments (2)
src/dhcpcd.c (2)
396-397: LGTM! Correct use of MSG_EOR for message boundary enforcement.The change from
write()tosend(..., MSG_EOR)properly enforces message boundaries on the SOCK_SEQPACKET socket, aligning with the PR's objective to fix FreeBSD 15.0 compatibility issues.
1452-1453: LGTM! Consistent MSG_EOR usage in signal handler.The conversion to
send(..., MSG_EOR)maintains proper message boundaries for the signal forwarding mechanism, and the error message is correctly updated to reflect the new system call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/privsep.c (1)
888-918: Pad buffer may be too small for worst-case CMSG alignment (possible OOB read).
long padding[1]assumes max pad ≤ sizeof(long). On some ABIs alignment forstruct cmsghdrcan exceed that, risking reading past the local buffer whencmsg_padlen > sizeof(long). Use a buffer sized to the worst-case control-header alignment.Apply:
- long padding[1] = { 0 }; + uint8_t padding[CMSG_SPACE(0)] = { 0 }; @@ - assert(cmsg_padlen <= sizeof(padding)); + assert(cmsg_padlen <= sizeof(padding));This keeps the existing assertion valid while guaranteeing enough zero bytes for any required pad.
🧹 Nitpick comments (1)
src/privsep.c (1)
1095-1095: DropMSG_WAITALLonrecvmsgforSOCK_SEQPACKET.
MSG_WAITALLis redundant onSOCK_SEQPACKETsockets since end-of-record (EOR) implicitly terminates the read. Remove it—or at minimum add a comment explaining that EOR ends the wait—in both locations:
src/privsep.c:1095src/logerr.c:415
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/privsep.c(5 hunks)
🔇 Additional comments (3)
src/privsep.c (3)
924-934: Bounds check on payload iovecs looks correct now.
m.msg_iovlen + i >= __arraycount(iov)guards before theiovp++, preventing overflow intoiov[7]. This addresses the prior off-by-one risk.
937-937: Using sendmsg(..., MSG_EOR) to enforce record boundaries: LGTM.This correctly prevents message coalescing on SOCK_SEQPACKET.
1032-1070: ps_sendcmdmsg: switched to sendmsg + MSG_EOR: LGTM.The local msghdr init and single-call send with EOR preserves framing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/logerr.c (1)
220-238: Good clamp; add MSG_NOSIGNAL to prevent SIGPIPE on closed peerThe truncation guard around vsnprintf() looks correct now. To avoid a process‑terminating SIGPIPE if the logging peer is gone, send with MSG_NOSIGNAL when available (or set SO_NOSIGPIPE at socket creation). Also fine to keep iovec; no functional issue there.
Apply this minimal change:
- len = (int)sendmsg(ctx->log_fd, &msg, MSG_EOR); + int sflags = MSG_EOR; +#ifdef MSG_NOSIGNAL + sflags |= MSG_NOSIGNAL; +#endif + len = (int)sendmsg(ctx->log_fd, &msg, sflags);Optional: consider using
ssize_tforlenin this block to match send/recv APIs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/logerr.c(3 hunks)
🔇 Additional comments (5)
src/logerr.c (5)
29-31: Headers for sendmsg/recvmsg are correctIncluding <sys/socket.h> (and keeping <sys/time.h>) is required for msghdr/iovec and flags. LGTM.
403-414: recvmsg layout and iov sizing look soundThe iovec ordering (pri, pid, payload) and total buffer sizing align with the sender; initializing msghdr with the full iov is clear. LGTM.
419-426: Bounds/truncation checks and enforced NUL are correctRejecting frames with MSG_TRUNC or header‑short reads, and forcing a terminator at the computed end, prevents overreads and glued messages. Nicely done.
427-429: Setting log_pid for formatting around the forwarded message is appropriateTemporarily assigning ctx->log_pid to the sender’s pid and restoring it preserves existing formatting behavior. LGTM.
415-417: Propagate EOF from recvmsg instead of mapping to errorWhen the peer closes, recvmsg() returns 0. Currently this maps 0 to -1, losing the EOF signal. Recommend returning 0 so callers can distinguish clean shutdown vs error.
Proposed change:
- len = (int)recvmsg(fd, &msg, MSG_WAITALL); - if (len == -1 || len == 0) - return -1; + len = (int)recvmsg(fd, &msg, MSG_WAITALL); + if (len <= 0) + return len;
|
@glebius many thanks for your help solving this and the reviews so far. Anything else you would like to add, or we are good now? |
|
Glad to see traction here! Thanks everyone |
I have only brief understanding of dhcpcd multi-process design, and after this disclaimer :) I can say that everything looks good to me. |
The nature of the SOCK_SEQPACKET, that privsep modules uses, is stream. See:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/V2_chap02.html#tag_16_10_06
To guarantee that a reader will never read two messages in one read operation, the writer shall put end of record markers.
The problem exposed itself in FreeBSD 15.0 that started to follow the specification better than before.
Other SOCK_SEQPACKET usage considerations: a) as long as our reader provides a receive buffer that would fit the largest message our writer would ever send, we are good with regards to not a reading a partial message b) as long as our writer always write full messages with one write, we don't need use of MSG_WAITALL in reader.
Fixes #530