Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip setproctitle in task_runner on Mac OS #45124

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

jaketf
Copy link
Contributor

@jaketf jaketf commented Dec 20, 2024

On some newer versions of Mac OS setproctitle can cause segfault benoitc/gunicorn#3021


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

On some newer versions of Mac OS setproctitle can cause segfault
benoitc/gunicorn#3021
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've started noticed this causing tests on OSX to fail on occassion, but hadn't noticed any runtime issues. Makes sense though.

There are probably a few other cases where we setproctitle (in the dag parser code I just landed inside airflow/dag_processor/ could you update those too?)

Comment on lines +188 to +190
else:
from setproctitle import setproctitle
setproctitle("airflow scheduler -- DagFileProcessorManager")
Copy link
Contributor

@jlaneve jlaneve Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
from setproctitle import setproctitle
setproctitle("airflow scheduler -- DagFileProcessorManager")
else:
from setproctitle import setproctitle
setproctitle("airflow scheduler -- DagFileProcessorManager")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may actually be a GitHub bug, in the "Files changed" tab it shows the indentation being off, but in the conversation / timeline it shows the indentation as being correct (and my suggested change is unneeded indentation)

Screenshot 2024-12-20 at 6 01 53 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in contact with setproctitle maintainer during the "Airflow Beach Cleaning" project. I can ask him to comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After short discussion with @dvarrazzo - it's likely this dvarrazzo/py-setproctitle#144 is going to fix it (unreleased yet).

It would be great though to get some more details about those segfaults @jaketf @ashb when you see them happening again ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest task_sdk (locally, not breeze) would trigger it about 10-25% of the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk Setproctitle creates a thread internally at import time on macos.

Copy link
Member

@ashb ashb Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb I've run pytest task_sdk (clean clone from master branch) hundreds of times on both Intel and Apple Silicone Macs (macOS 15.1.1 and 15.0.1 respectively) but it didn't crash for me. Obviously something is different with my setup but what? Puzzled.

@gershnik That's odd.

I can now almost 100% reproduce this error. M2 Macbook pro here on Sonoma 14.7

Copy link
Member

@ashb ashb Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gershnik Native stack trace from Console.app:

VM Region Info: 0x104d60a8e is not in any region.  Bytes after previous region: 2703  Bytes before following region: 128370
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      MALLOC metadata             104d5c000-104d60000    [   16K] rw-/rwx SM=COW  
--->  GAP OF 0x20000 BYTES
      __TEXT                      104d80000-104d84000    [   16K] r-x/rwx SM=COW  /Users/USER/*/_setproctitle.cpython-312-darwin.so

Application Specific Information:
crashed on child side of fork pre-exec

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x1931c95d0 __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x193201c20 pthread_kill + 288
2   libsystem_c.dylib             	       0x1930d81e0 raise + 32
3   libpython3.12.dylib           	       0x1068f0c68 faulthandler_fatal_error + 384
4   libsystem_platform.dylib      	       0x193232584 _sigtramp + 56
5   libsystem_trace.dylib         	       0x192f4f0a4 _os_log_preferences_refresh + 40
6   libsystem_trace.dylib         	       0x192f4fb20 os_log_type_enabled + 712
7   CoreFoundation                	       0x1932cbab8 _CFBundleCopyLoadedImagePathForPointer + 84
8   CoreFoundation                	       0x1933898b0 _CFBundleGetBundleWithIdentifier + 164
9   _setproctitle.cpython-312-darwin.so	       0x104d82ffc darwin_set_process_title + 84
10  _setproctitle.cpython-312-darwin.so	       0x104d838b8 init_ps_display + 208
11  _setproctitle.cpython-312-darwin.so	       0x104d8359c spt_setup + 400
12  _setproctitle.cpython-312-darwin.so	       0x104d83254 spt_getproctitle + 16
13  libpython3.12.dylib           	       0x1061f5170 cfunction_vectorcall_NOARGS.llvm.1866380503741956643 + 104
14  libpython3.12.dylib           	       0x105fe5dfc _PyEval_EvalFrameDefault + 156008
15  libpython3.12.dylib           	       0x10608b3a0 PyEval_EvalCode + 220
16  libpython3.12.dylib           	       0x1062148d4 builtin_exec + 396
17  libpython3.12.dylib           	       0x1061f50b4 cfunction_vectorcall_FASTCALL_KEYWORDS.llvm.1866380503741956643 + 92
18  libpython3.12.dylib           	       0x105fe9a10 _PyEval_EvalFrameDefault + 171388
19  libpython3.12.dylib           	       0x105f5c928 _PyObject_VectorcallTstate.llvm.2292412377633951376 + 84
20  libpython3.12.dylib           	       0x105fa3cbc object_vacall.llvm.2292412377633951376 + 240
21  libpython3.12.dylib           	       0x105fa3518 PyObject_CallMethodObjArgs + 108
22  libpython3.12.dylib           	       0x105f6353c PyImport_ImportModuleLevelObject + 3100
23  libpython3.12.dylib           	       0x105fdcf04 _PyEval_EvalFrameDefault + 119408
24  libpython3.12.dylib           	       0x1061dd854 method_vectorcall.llvm.12955693216709424543 + 296
25  libpython3.12.dylib           	       0x105fea118 _PyEval_EvalFrameDefault + 173188
26  libpython3.12.dylib           	       0x105ffb244 _PyObject_Call_Prepend + 296
27  libpython3.12.dylib           	       0x105ffac48 slot_tp_call + 116
28  libpython3.12.dylib           	       0x105fe615c _PyEval_EvalFrameDefault + 156872
29  libpython3.12.dylib           	       0x105ffb244 _PyObject_Call_Prepend + 296
30  libpython3.12.dylib           	       0x105ffac48 slot_tp_call + 116
31  libpython3.12.dylib           	       0x105fe9b48 _PyEval_EvalFrameDefault + 171700
32  libpython3.12.dylib           	       0x105ffb244 _PyObject_Call_Prepend + 296
33  libpython3.12.dylib           	       0x105ffac48 slot_tp_call + 116
34  libpython3.12.dylib           	       0x105fe615c _PyEval_EvalFrameDefault + 156872
35  libpython3.12.dylib           	       0x105ffb244 _PyObject_Call_Prepend + 296
36  libpython3.12.dylib           	       0x105ffac48 slot_tp_call + 116
37  libpython3.12.dylib           	       0x105fe615c _PyEval_EvalFrameDefault + 156872
38  libpython3.12.dylib           	       0x105ffb244 _PyObject_Call_Prepend + 296
39  libpython3.12.dylib           	       0x105ffac48 slot_tp_call + 116
40  libpython3.12.dylib           	       0x105fe615c _PyEval_EvalFrameDefault + 156872
41  libpython3.12.dylib           	       0x10608b3a0 PyEval_EvalCode + 220
42  libpython3.12.dylib           	       0x10608b1f4 run_mod.llvm.6674925059613253997 + 284
43  libpython3.12.dylib           	       0x106106730 pyrun_file + 156
44  libpython3.12.dylib           	       0x106105e70 _PyRun_SimpleFileObject + 268
45  libpython3.12.dylib           	       0x1061000fc _PyRun_AnyFileObject + 80
46  libpython3.12.dylib           	       0x1060fedd4 pymain_run_file_obj + 164
47  libpython3.12.dylib           	       0x1060fe438 pymain_run_file + 72
48  libpython3.12.dylib           	       0x1060fc774 Py_RunMain + 1124
49  libpython3.12.dylib           	       0x1060dcf7c pymain_main + 456
50  libpython3.12.dylib           	       0x1060dcda8 Py_BytesMain + 40
51  dyld                          	       0x192e77154 start + 2476

and

-----------
Full Report
-----------

{"app_name":"python3.12","timestamp":"2025-01-07 15:12:30.00 +0000","app_version":"","slice_uuid":"4c4c44ea-5555-3144-a121-b75170b036a4","build_version":"","platform":1,"share_with_app_devs":0,"is_first_party":1,"bug_type":"309","os_version":"macOS 14.7 (23H124)","roots_installed":0,"incident_id":"38B0B6A5-AC4B-438D-9ED5-DDD274E676C1","name":"python3.12"}
{
  "uptime" : 510000,
  "procRole" : "Unspecified",
  "version" : 2,
  "userID" : 501,
  "deployVersion" : 210,
  "modelCode" : "Mac14,5",
  "coalitionID" : 745,
  "osVersion" : {
    "train" : "macOS 14.7",
    "build" : "23H124",
    "releaseType" : "User"
  },
  "captureTime" : "2025-01-07 15:12:30.1013 +0000",
  "codeSigningMonitor" : 1,
  "incident" : "38B0B6A5-AC4B-438D-9ED5-DDD274E676C1",
  "pid" : 51654,
  "translated" : false,
  "cpuType" : "ARM-64",
  "roots_installed" : 0,
  "bug_type" : "309",
  "procLaunch" : "2025-01-07 15:12:30.0878 +0000",
  "procStartAbsTime" : 12276109475883,
  "procExitAbsTime" : 12276109794958,
  "procName" : "python3.12",
  "procPath" : "\/Users\/USER\/*\/python3.12",
  "parentProc" : "python",
  "parentPid" : 51646,
  "coalitionName" : "com.github.wez.wezterm",
  "crashReporterKey" : "5BB32C2E-55C1-DCA5-AE8B-3531EFE6FBEE",
  "responsiblePid" : 804,
  "responsibleProc" : "wezterm-gui",
  "codeSigningID" : "-",
  "codeSigningTeamID" : "",
  "codeSigningFlags" : 570556961,
  "codeSigningValidationCategory" : 10,
  "codeSigningTrustLevel" : 4294967295,
  "instructionByteStream" : {"beforePC":"fyMD1f17v6n9AwCRd+D\/l78DAJH9e8Go\/w9f1sADX9YQKYDSARAA1A==","atPC":"AwEAVH8jA9X9e7+p\/QMAkWzg\/5e\/AwCR\/XvBqP8PX9bAA1\/WcAqA0g=="},
  "wakeTime" : 1751,
  "sleepWakeUUID" : "92850A44-01E1-4036-91AC-9437DCBBCA46",
  "sip" : "enabled",
  "vmRegionInfo" : "0x104d60a8e is not in any region.  Bytes after previous region: 2703  Bytes before following region: 128370\n      REGION TYPE                    START - END         [ VSIZE] PRT\/MAX SHRMOD  REGION DETAIL\n      MALLOC metadata             104d5c000-104d60000    [   16K] rw-\/rwx SM=COW  \n--->  GAP OF 0x20000 BYTES\n      __TEXT                      104d80000-104d84000    [   16K] r-x\/rwx SM=COW  \/Users\/USER\/*\/_setproctitle.cpython-312-darwin.so",
  "exception" : {"codes":"0x0000000000000001, 0x0000000104d60a8e","rawCodes":[1,4376103566],"type":"EXC_BAD_ACCESS","signal":"SIGSEGV","subtype":"KERN_INVALID_ADDRESS at 0x0000000104d60a8e"},
  "termination" : {"flags":0,"code":11,"namespace":"SIGNAL","indicator":"Segmentation fault: 11","byProc":"python3.12","byPid":51654},
  "vmregioninfo" : "0x104d60a8e is not in any region.  Bytes after previous region: 2703  Bytes before following region: 128370\n      REGION TYPE                    START - END         [ VSIZE] PRT\/MAX SHRMOD  REGION DETAIL\n      MALLOC metadata             104d5c000-104d60000    [   16K] rw-\/rwx SM=COW  \n--->  GAP OF 0x20000 BYTES\n      __TEXT                      104d80000-104d84000    [   16K] r-x\/rwx SM=COW  \/Users\/USER\/*\/_setproctitle.cpython-312-darwin.so",
  "asi" : {"libsystem_c.dylib":["crashed on child side of fork pre-exec"]},
  "extMods" : {"caller":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"system":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"targeted":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"warnings":0},
  "faultingThread" : 0,

Copy link

@gershnik gershnik Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb Thank you, this is super helpful! (I am still unable to reproduce this even once - just tried it again after updating macOS)

With regards to threads, setproctitle doesn't by itself create any threads. Apple frameworks do so internally for their XPC with Launch Services but those sit dormant unless functionality that is using them is invoked. Also see below.

The crash happens in a child process post-fork on a main thread, early during setproctitle initialization in CFBundleGetBundleWithIdentifier call. It is called by setproctitle with a static argument1:

CFBundleGetBundleWithIdentifier(CFSTR("com.apple.LaunchServices"))

so it references no caller-supplied memory that can become invalid somehow. Thus, it is the internal memory of CoreFoundation that is somehow corrupted at the time of this call. In other words the crash is "impossible" unless CoreFoundation itself is in a broken state.
Also note that any threads Apple might create hasn't been started yet - this call happens long before such functionality is invoked.

All of this, combined with the fact that the crash is very non-deterministic suggests that setproctitle is a victim here of something (potentially itself) using Apple APIs on another thread in parallel with fork.

So the question is whether this is what is going on. Are there any calls to to setproctitle (including importing it) or any other Apple-using library in the parent process that can happen in parallel with fork?

[Update]
@potiuk - just realized that your comment indicates that this is actually a known issue that has other manifestations, correct?

Footnotes

  1. CFSTR is an Apple macro to produce a statically allocated CFString constant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this one is slightly odd. It's def triggerable 100% of the time for me.

Acording to python threading module there is only a single thread at the point of calling os.fork.

The odd thing here is that if I import setproctitle eagerly before fork, then the SIGSEGV goes away, but Py 3.12 now starts complaining about "This process (pid=96305) is multi-threaded, use of fork() may lead to deadlocks in the child."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants