Skip to content

Conversation

@mdkent
Copy link

@mdkent mdkent commented Jul 30, 2021

Naive fix for an issue we're seeing here in trying to rbtrace some puma workers where it seems bin/rbtrace is triggering the signal_handler_wrapper a bit too quickly, and it runs through all the msgrcv() retries before the message actually arrives from bin/rbtrace. This results in:

a@b:/u/apps/c/current$ bundle exec rbtrace -p 7160 -e "1+1"
Error: argument --pid (pid is not listening for messages, did you `require "rbtrace"`).
Try --help for help.

Here's an strace of the bin/rbtrace side:

16:42:11.312872 msgsnd(393220, {1, "\222\246attach\315\237\254\<snip>

and an strace of the puma worker:

[pid 21689] 16:42:11.288543 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.288637 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.288719 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.288806 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.288893 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.288974 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.289057 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.289138 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.289225 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 21689] 16:42:11.289315 msgrcv(393220, 0x7fba024fd410, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)

Note the timestamp, the reads all happen before the message queue is written to.

Here's the puma worker side with this new wait, we get 3 msgrcv retries with the 100ms sleep:

[pid 16328] 05:53:22.789390 msgget(0xffffc053, IPC_CREAT|0666) = 1703938
[pid 16328] 05:53:22.789458 socket(AF_UNIX, SOCK_DGRAM, 0) = 26
[pid 16328] 05:53:22.789529 setsockopt(26, SOL_SOCKET, SO_SNDBUF, [65536], 4) = 0
[pid 16328] 05:53:22.789599 msgrcv(1703938, 0x7f2ada4f4fc0, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 16328] 05:53:22.789667 nanosleep({tv_sec=0, tv_nsec=100000000}, NULL) = 0
[pid 16328] 05:53:22.889931 msgrcv(1703938, 0x7f2ada4f4fc0, 256, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
[pid 16328] 05:53:22.890079 nanosleep({tv_sec=0, tv_nsec=100000000},  <unfinished ...>
[pid 16301] 05:53:22.931987 <... ppoll resumed> ) = ? ERESTARTNOHAND (To be restarted if no handler)
[pid 16301] 05:53:22.932061 --- SIGURG {si_signo=SIGURG, si_code=SI_USER, si_pid=18720, si_uid=1003} ---
[pid 16301] 05:53:22.932157 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = ? ERESTARTNOHAND (To be restarted if no handler)
[pid 16301] 05:53:22.982165 --- SIGURG {si_signo=SIGURG, si_code=SI_USER, si_pid=18720, si_uid=1003} ---
[pid 16301] 05:53:22.982324 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
[pid 16328] 05:53:22.990246 <... nanosleep resumed> NULL) = 0
[pid 16328] 05:53:22.990307 msgrcv(1703938, {1, "\222\246attach\315I<snip>

As I mentioned this is a naive fix to get some debugging working, but I thought it was worth sharing.

@jeremy jeremy force-pushed the wait-longer-for-accept branch from c84e4e1 to 4d21dbe Compare October 11, 2024 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant