Open
Description
What type of defect/bug is this?
Crash or memory corruption (segv, abort, etc...)
How can the issue be reproduced?
There is a rare crash that occurs on the eduroam freeradius containers. I would estimate about once every 3-6 months.
eduroam is running from realease_3.2.3 tag. The low frequency makes this lower priority.
Note there was a small change to attempt to fix a state mismatch issue in auth.c from Alan. I could look at a previous core from before the change to verify whether this matters. I expect the earlier crash to be the same location though.
Log output from the FreeRADIUS daemon
There were some simultaneous errors occurring at the time of the crash:
proxy: Failed allocating Id for proxied request
Error: Failed adding event handler for socket proxy address * port 56072: Too many readers
Solving these is a separate issue, but may be related to why the crash is happening/possible.
Relevant log output from client utilities
No response
Backtrace from LLDB or GDB
Loading up the core dump, the exception occurred on line 2834 of main/process.c. The value for proxy_listener was NULL (request given below).
Given line 2822 checks for null, this would mean the value of request->proxy_listener changed between line 2822 and 2834.
2822 if (!request->proxy_listener || !request->proxy_listener->data) {
2823 proxy_reply_too_late(request);
2824 return 0;
(gdb)
2825 }
2826
2827 gettimeofday(&now, NULL);
2828
2829 /*
2830 * Status-Server packets don't count as real packets.
2831 */
2832 if (request->proxy->code != PW_CODE_STATUS_SERVER) {
2833 #ifdef WITH_TCP
2834 listen_socket_t *sock = request->proxy_listener->data;
The rarity of this issue would be explained by the fact that the chance of a change between the two lines would be quite small (I am also inclined to think it is related to exceeding the 512 readers limit).
Backtrace:
(gdb) bt
#0 0x000055dd9b4d54f6 in request_proxy_reply (packet=0x7f55ac28d030) at src/main/process.c:2834
#1 0x000055dd9b4b77dd in proxy_socket_recv (listener=0x55dda3f80b10) at src/main/listen.c:2423
#2 0x00007f55db2c3ec4 in fr_event_loop (el=0x55dda3e27fb0) at src/lib/event.c:710
#3 0x000055dd9b4ab4dc in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:634
The value for request was:
(gdb) p *request
$2 = {number = 4182020, timestamp = 1706535578, data = 0x0, listener = 0x55dda3f7e780, client = 0x55dda3185f50,
packet = 0x55dda4a2fea0, username = 0x55dda4a30420, password = 0x0, reply = 0x55dda4a30320, config = 0x55dda4a31ca0,
state_ctx = 0x7f55b4099e00, state = 0x0, proxy_listener = 0x0, proxy = 0x7f55a8121c00, proxy_reply = 0x7f55d0257bd0,
home_server = 0x55dda13d6b30, home_pool = 0x55dda2efd400, process = 0x55dd9b4cda20 <request_cleanup_delay>,
response_delay = {tv_sec = 0, tv_usec = 0}, timer_action = FR_ACTION_TIMER, ev = 0x7f558c0c1ff0,
handle = 0x55dd9b4aefd0 <rad_authenticate>, rcode = RLM_MODULE_NOOP, module = 0x55dd9b4f4b9f "",
component = 0x55dd9b4f4dfe "<REQUEST_CLEANUP_DELAY>", delay = 5, master_state = REQUEST_ACTIVE,
child_state = REQUEST_CLEANUP_DELAY, child_pid = 140006714034112, root = 0x55dd9b5174e0 <main_config>, simul_max = 0,
simul_count = 0, simul_mpp = 0, priority = RAD_LISTEN_PROXY, max_time = false, in_request_hash = true,
in_proxy_hash = false, num_proxied_requests = 1, num_proxied_responses = 1, server = 0x55dd9f70f180 "default",
parent = 0x0, log = {func = 0x7f55db2ef460 <vradlog_request>, lvl = L_DBG_LVL_OFF, indent = 0 '\000'}, options = 2,
coa = 0x0, num_coa_requests = 0}