Skip to content

win32: allow build with USE_ITHREADS=undef but USE_IMP_SYS=define #23178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: blead
Choose a base branch
from

Conversation

tonycoz
Copy link
Contributor

@tonycoz tonycoz commented Apr 7, 2025

While testing #17601 the build failed with:

perllib.c(62): error C2039: 'cur_tid': is not a member of 'interp_intern'
...\perl\win32\win32.h(566): note: see declaration of 'interp_intern'
perllib.c(68): error C2039: 'cur_tid': is not a member of 'interp_intern'
...\perl\git\perl\win32\win32.h(566): note: see declaration of 'interp_intern'

  • This set of changes requires a perldelta entry, and it is included.

tonycoz added 2 commits April 7, 2025 13:21
Discovered while trying to test Perl#17601 since the build failed with:

perllib.c(62): error C2039: 'cur_tid': is not a member of 'interp_intern'
...\perl\win32\win32.h(566): note: see declaration of 'interp_intern'
perllib.c(68): error C2039: 'cur_tid': is not a member of 'interp_intern'
...\perl\win32\win32.h(566): note: see declaration of 'interp_intern'
This was used rather questionably to crash perl, stopping
in the debugger if there is one, but that was removed in 7bd379e.
@bulk88
Copy link
Contributor

bulk88 commented Apr 8, 2025

This patch is broken and using the wrong macro test and should not be in blead as currently written. USE_ITHREADS is wrong. It should be be testing PERL_IMPLICIT_CXT or PERL_MULTIPLICITY. USE_ITHREADS just means can the libperl.so/.dll do an ithread virtual-fork() of PP VM state, or do Perl_foo_clone_infrastructure_xyz() things. No ithreads.pm mode, AFAIK, also means none of these features, Win32-only psuedo-fork, fake %ENV, fake pwd, fake cwd are compiled into libperl.dll.

BUT!!!! BUT!!!! No ithreads.pm mode WITH !!!! var PerlInterpretor * my_perl everywhere, is a 100% valid interp build config. It means the user is an embedded WinPerl user, who wants to spin up multiple my_perls on demand, but doesn't need those my_perls to ever communicate or see/be aware of each other. Something like W3C's "same origin policy" or Safari/Chrome browser tab isolation block diagram.

USE_ITHREADS is too narrow of a macro IMO, the test needs to be the macro that controls var my_perl or no var my_perl. This is the actual error with this commit. Just change it to the "does this perl have the "my_perl var" or not?" #define and this commit is okay with me then to get pushed to blead and can get closed.

This PR has a strategic/long term problem that needs some talk.

Since C func checkTLS() is a perma-assert macro/function from like 1997-2001, It had a very real, very important stability/debugging purpose back then. Post bulk88 learns how to write patches in C for interp core perl (2012-2016), almost all ithread bugs have been fixed, plus with GH Actions automated testing, weird SEGVs with WinPerl+ithreads are caught and bug tickets are created in hours/days, not weeks or months later.

my_perl->Isys_intern->last_or_current_win32_os_tid_seen, which is a U32 of memory, is a big perf optimization for the ancient checkTLS() function, and somehow, for unknown future P5P WinPerl code, a "super handy" value for that future unknown WinPerl P5P C code.

I do see I made a mistake by not locally having a no-threads blead perl perma-installed to test with, but, there are 2 design choices I need another persons opinion on:

  • IDK if var cur_tid should be missing/removed, #ifdef style, on no-my_perl var WinPerl builds since it has "no purpose"
  • or that value should be collected and stored/kept around, even with on single threaded no-my_perl var WinPerl builds

The very specific and very real hazard, that WinPerl, thds and less-so (much-less-so ???) hazard, is unexpected execution of, a CPAN XS author's, C function ptrs, which are iterator C callbacks, that internally execute Perl_call_sv(), and Mr/Mrs CPAN's func ptrs were passed to a random 3rd party C/C++ DLL. That 3rd party DLL, is

  • MS official DLL from Win7.iso-Win11.iso
  • closed source DLL made by Foo LLC
  • an open source FOSS DLL made by Foo non-profit LLC
  • made by a different Mr/Mrs Foo, in their unpaid volunteer free time on weekends and published on Github

Then any of the 4 above, will RANDOMLY execute Mr/Mrs CPAN XS author's iterator, either as

  • OS thread blocking I/O syncron I/O same OS thread (100% Perl C safe and compat, no C code changes needed)
  • a random lifetime, random OS TID, thread pool thread, made/allocated/orchestrated, by a thread pool manager, with an unknown name, unknown origin, unknown from where (.dll/.exe/address space wise)

Permutation # 2 needs Mr/Mrs CPAN author to add about 1-3 pages of Perl C code to their XS module to have their iterator callback run safely, the 1-3 pages of Perl C code needed to do this has 5-10 different correct and safe versions/implementations on how to do this safely scattered around on grep.metacpan.org/perlmonks/stack overflow.

Basically safely freeze/pause/wait/deep sleep the my_perl on a PP level from XS on OS thd # 1, then detach the my_perl from OS thd # 1's OS TLS with Perl_set_context(), then in new random TID OS thd # 2, attach the my_perl to OS TLS with Perl_set_context() , the my_perl var that used to be in OS thd # 1, then wake up that my_perl with Perl_call_sv();, then reverse the process and move the my_perl var from OS thd # 2 back to OS thd # 1 and somehow make OS thd # 1 OS kernel wise "wake up", and the Perl VM/custom CPAN XS/P5P runloop starts executing on its "home" OS thread again. Now back to "you" and "your OS thd # 2", once you returned ownership of that my_perl back to OS thd # 1, then you on a C level, do a C lang return ONE_OR_ZERO_OR_VOID; from your C func callback iterator, and you and OS thd # 2 go poof.

From the P5P WinPerl side of things, how important is it to stop the severe mem corruption that will instantly happen when 1 my_perl struct/ptr, gets launched/executed by a random 3rd party C lib, whose sales brochure says max vectorization and maximum and infinite n-way scalability, and created 16-64 brand new OS threads in 0.15 milliseconds, and simultaneously executed Mr/Mrs CPAN's C callback iterator on 32 physical CPU cores at once using a single my_perl struct?

Microsoft MSDN/official API docs do a very very poor job documenting if they call user provided func ptrs, syncron or in a thread pool.

Its undefined behavior b/c our lawyers told us if we document the user provided C fnc ptr iterator callback as sync blocking, and one day make it async/scalable we will get a lawsuit from the customer from all the CVEs/Heap Corruption we caused them on their previously deployed production SW.

If we document it as async/scalable, and a C dev catches/reverse engineers we executed their callback syncron when our API docs said the iterator is async into another thread, we get a lawsuit for deceptive trade practices because an ad tech company lost millions of $$$ because their TTFB serving an ad increased by 3 milliseconds per ad served.

CPAN's JDB's Win32.pm/Win32.xs got bitten in 2021 by accidental threadpool drama when a new perl XSUB got added to Win32.pm. It was fixed in a few days/weeks and is now a non-issue, but this happens even in the 2020s, so some thinking needs to be done, how much protection, if any protection, and when and where and how often, to do the protection checks at runtime, WinPerl, no-thd and threaded, needs, against this hazard.

Someones, anyones, 2nd opinion, is needed on the topic. I have my own opinions and gut instinct of what I want to do, or how I want to "solve" this hazard, but I want someone else to think over my thinking.

@@ -62,11 +63,16 @@ win32_checkTLS(PerlInterpreter *host_perl)
if(tid != host_perl->Isys_intern.cur_tid) {
dTHX; /* heavyweight */
if (host_perl != my_perl) {
int *nowhere = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave JHI's/NIS's/Sarathy's forced crash in place, or change it to DebugBreak() or __debugbreak(). If this branch/text, is reached, the end user 100% needs to open GDB or VS IDE C debugger immediately. DebugBreak() and __debugbreak() generate a different distinct 0xC0000,00__ exception code from 0xC000,00_SEGV_UNMAPPED_ADDR. Your tech-illiterate end users will see the different 0xC0000,00__ vs 0xC000,00_SEGV_UNMAPPED_ADDR. error code in the WinOS GUI crash popup box. And its definitely cleaner/more dev friendly/senior citizen friendly to, umm, "violently" end a process with an error code that means "the C code intentionally voluntarily violently killed the process" vs "the C code has a logic flaw and violently killed the process".

Deref-ing mem address 0, and executing __debugbreak() bring up the exact same WinOS GUI crash/WER popup, with the same "Debug now" button to click on, for P5P/CPAN XS author use so there is no harm or downside to switching from "intentional deref addr 0 crash" to "intentionally requested crash and advise end user to figure out the rest with their C debugger if they know how to use it".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What forced crash? There hasn't been a crash here since 7bd379e. int *nowhere = NULL is a no-nop.

#else
dTHX;
if (host_perl != my_perl) {
abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have zero trust, in ascii token abort() safely stopping all execution of (P5P || CPAN XS author || end user || random 3rd party .dll) controlled (PP || C || C++) code in a Win32 process.

Copy link
Contributor

@bulk88 bulk88 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have zero trust, in ascii token abort() safely stopping all execution of (P5P || CPAN XS author || end user || random 3rd party .dll) controlled (PP || C || C++) code in a Win32 process.

MS SDL security policy says abort() is forbidden to be used if a perma-assert() decides blackhat machine code is executing inside a Win32 process. Win 8 or Win 10 kernel introduced a user mode __fastfail() x86/x64/ARM CPU opcode to guarantee a Win32 process and its address space 100% stops all execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poking around inside machine code of msvcrt.dll shows, libc abort(), can theoretically resume PP code execution through PP's %SIG by executing MS CRT's libc's abort extern C function, plus MS CRT's abort sends out SEH "events" to all active listening registered SEH handlers in the process. 95%-100% of "all active listening registered SEH handlers" are usually exception frames created by C++ lang's try/catch/throw keywords, even WinPerl with P5P default build config out of the box, adds its own SEH event listener callback, with a requested priority of 'Z', so it runs as later as technologically possible, after all C++/C89 try/throw/catch stuff executed, and as the very very last catch frame, before WinOS Kernel draws the WER crash GUI popup box.

Although MS's SEH exception events/dispatch API is language neutral and available in Win32/64 assembly and ISO C89 lang code if someone wants to use the SEH API, 99% of real life catch frames will be random C++ lang code. But the problem is, from my opinion, P5P/me/no developer on earth, knows at "HW fault throw" time, or "SEGV event callback event listener invoke loop time", how many, if any, and where those exception frames came from, or who/what in virtual address space, put their callbacks on the non-public Win Kernel event listener array of things that want to listen to the Win Kernel's SEGV event queue in user mode. So something like MS CRT's "abort()", which uses "warning class (0x4____)" error 0x40000015 STATUS_FATAL_APP_EXIT, something in the SEH stack might have the bright idea to trap/resume, because its 0x4___ "warning class error" which supposedly means "informational" and "normal control flow sometimes"

And abort() isn't using a "0x8___" class error which means (corrupt || unknown || invalid || junk bytes || freed) (object instance || handle || FD || obj ptr)

or the 0xC___ class errors, which are "call 911/112" emergencies, since these are unrecoverable physical hardware errors, SEGVs, SIGILL, SIGBUS, integer div by zero, HW FP math exceptions, or a PCIe device disappeared from the PCIe bus randomly, etc. Nobody, even as a 1st timer coder, would try to trap resume those 0xC___ errors/or the NTSTATUS .h C constants behind those 0xC___ codes at runtime, and anything who knows what the 0xC___ class codes even are, knows what they are doing if they test for them in a SEH catch filter block.

Don't quote me, and I don't feel like googling it, but some, maybe upto half the 0xC___ codes, are documented by MS as non-resumable even if the catch block specifically ask the OS to resume execution with its (end user supplied) CPU register context state snapshot. But remember, nothing will stop a catch block, from synchronously, in normal C89, from executing a Perl_call_sv(); and letting some PP code execute ;-)

whats wrong with writing a Win32 C debugger in Pure Perl and publishing it on CPAN? Nothing, it could be a 100% legit useful CPAN tool lol.

if there is one.

Previously this would exit silently and mysteriously, and unlike
POSIX-likes abort() doesn't generally break into the debugger on
Win32.
@tonycoz
Copy link
Contributor Author

tonycoz commented Apr 8, 2025

This patch is broken and using the wrong macro test and should not be in blead as currently written. USE_ITHREADS is wrong. It should be be testing PERL_IMPLICIT_CXT or PERL_MULTIPLICITY.

The inclusion of cur_tid in interp_intern is controlled by USE_ITHREADS:

perl5/win32/win32.h

Lines 581 to 583 in df4834b

#ifdef USE_ITHREADS
DWORD cur_tid;
#endif

and the missing cur_tid was the cause of the failure in building perl in the failing configuration.

The entire function is already guarded by PERL_IMPLICIT_SYS:

perl5/win32/perllib.c

Lines 47 to 54 in df4834b

#ifdef PERL_IMPLICIT_SYS
#include "perlhost.h"
void
win32_checkTLS(PerlInterpreter *host_perl)
{
/* GCurThdId() is lightweight, but b/c of the ctrl-c/signals sometimes firing

PERL_MULTIPLICITY was defined in the failing build, testing against that would continue to try to compile the code that failed to compile.

I have zero trust, in ascii token abort() safely stopping all execution of (P5P || CPAN XS author || end user || random 3rd party .dll) controlled (PP || C || C++) code in a Win32 process.

MS SDL security policy says abort() is forbidden to be used if a perma-assert() decides blackhat machine code is executing inside a Win32 process. Win 8 or Win 10 kernel introduced a user mode __fastfail() x86/x64/ARM CPU opcode to guarantee a Win32 process and its address space 100% stops all execution.

From what I can see, this check is a protection against bugs, not against an attacker.

I've added a panic message (via WriteFile()) and DebugBreak(), since abort() doesn't break into the debugger as it typically does on POSIX-likes.

@tonycoz
Copy link
Contributor Author

tonycoz commented Apr 9, 2025

That code failed to build for a different reason with implicit sys enabled and no multiplicity.

Even with this change such a build fails later on for unrelated reasons:

bsd_glob.c
bsd_glob.c(798): error C2065: 'PerlDir_read': undeclared identifier
bsd_glob.c(798): warning C4312: 'type cast': conversion from 'int' to 'direct *(__cdecl *)(DIR *)' of greater size
gmake[1]: *** [makefile:338: bsd_glob.obj] Error 2
gmake[1]: Leaving directory 'C:/Users/Tony/dev/perl/git/perl/ext/File-Glob'

but I'm not going to try to fix that here.

@bulk88
Copy link
Contributor

bulk88 commented Apr 9, 2025

The entire function is already guarded by PERL_IMPLICIT_SYS:

perl5/win32/perllib.c

Lines 47 to 54 in df4834b

#ifdef PERL_IMPLICIT_SYS
#include "perlhost.h"
void
win32_checkTLS(PerlInterpreter *host_perl)
{
/* GCurThdId() is lightweight, but b/c of the ctrl-c/signals sometimes firing

PERL_MULTIPLICITY was defined in the failing build, testing against that would continue to try to compile the code that failed to compile.

I'm not going to address that mess of permutations in this PR, and this PR isn't about that mess of WinPerl build config flag permutations. Its for another PR if there ever is one by me or someone else.

From what I can see, this check is a protection against bugs, not against an attacker.

Correct, this is a check against accidental thread-pooling of a my_perl ptr, regardless of what human/what group of humans/what lines of C code/what .dll caused the accidental thread-pooling to happen. There is no attacker here. The cause is an innocent bug, caused by someone not reading their upstream lib's API docs cover to cover, or that upstream lib has very poor or very limited API docs, or a CPAN XS author is simply inexperienced/1st timer (100% okay).

__fastfail() cpu opcode can be used here, but it a way beyond overkill option to use here. The goal is to freeze/suspend/terminate the process ASAP without using Perl C APIs or using the junk value my_perl ptr we just got and SEGVing AGAIN on the garbage my_perl ptr and failing to UI draw our fatal error we want to draw to the end user.

I've added a panic message (via WriteFile()) and DebugBreak(), since abort() doesn't break into the debugger as it typically does on POSIX-likes.

Your new version is perfect.

  • Zero/nothing from the Perl C API is used. The my_perl ptr currently holding uninited garbage/perl54x.dll's TLS slot/PerlIO API are never invoked.
  • MS CRT raise() which may, or may not somehow, wake up the VM isn't used
  • not in blead perl, but in recent past, perl SetUnhandledExceptionFilter() callback doesn't fire, and no C++/SEH callbacks get to fire, and further mess up the process. Something like Test2:: or a competitor putting hooks everwhere like in sub END %SIG and at a C level MS CRT's at_exit() and who knows where else what put hooks, to call Storable.pm and open a PP socket FD to report heap corruption (oh !@#$ no, the real bug can't be identified because 2-5 different exception reporting APIs or logging/diag tool APIs subsystems SEGVed themselves trying to report the first SEGV event)
  • A UI message for non-C devs/PP only devs was drawn, giving them some kind of "bug info" to report to someone else with enough C lang skills to fix whatever bad happened.
  • DebugBreak()/__debugbreak() was used. C debugger UI/IDE will open up instantly, to help out a CPAN XS author, actively trying out experiments on a XS modules, and hacking on new or old XS code.

Remember, this class of accidental thread-pool errors, might be very hard/impossible to reproduce, if the WinPerl process (any process) is started from inside VS IDE's C debugger, or MS's TUI-ish windbg.exe C debugger, since ntdll.dll/kernel32.dll start to behave very very different at runtime if the process was started by a C debugger, and not a normal CreateProcess().

"very very different" means all the max-runtime-asserting features in NtGlobalFlags and whatever other undocumented "checked build" or optional runtime undocumented assert logic MS randomly adds and removes over the years.

@bulk88
Copy link
Contributor

bulk88 commented Apr 9, 2025

That code failed to build for a different reason with implicit sys enabled and no multiplicity.

Even with this change such a build fails later on for unrelated reasons:

bsd_glob.c
bsd_glob.c(798): error C2065: 'PerlDir_read': undeclared identifier
bsd_glob.c(798): warning C4312: 'type cast': conversion from 'int' to 'direct *(__cdecl *)(DIR *)' of greater size
gmake[1]: *** [makefile:338: bsd_glob.obj] Error 2
gmake[1]: Leaving directory 'C:/Users/Tony/dev/perl/git/perl/ext/File-Glob'

but I'm not going to try to fix that here.

What does that build flag even mean in real life/production code purpose? Why does a no-thds libperl.dll, either

  • started by the official stub perl.exe
  • loaded into a random other Win32 process by a random private citizen embedder user

need to virtualize PP level or XS level %ENV and CWD? There is no other "Perl VM" to compete with inside the current processes's virtual address space, for control of "OS-level process global state" settings. The 1 and only 1 reason, I can think of, is the ancient WinCE/Symbian/Win16 Perl ports, where those 3 OSes had very very unusual (crazy) definitions of a "process" and a "pid" number and a "process handle". Specific that behavior, where what the WinCE/Symbian/Win16 official SDK API docs called a process and a pid, 100% of people now call a thread and a tid.

@tonycoz
Copy link
Contributor Author

tonycoz commented Apr 9, 2025

What does that build flag even mean in real life/production code purpose? Why does a no-thds libperl.dll, either

* started by the official stub perl.exe

* loaded into a random other Win32 process by a random private citizen embedder user

We supply those options, just as we do on non-Win32.

If we're not going to support them, we can remove them, if we leave them in it's reasonable to fix them.

I'm not saying we should put huge amounts of effort into fixing them but the actual fix here has been reasonably small and obvious.

@bulk88
Copy link
Contributor

bulk88 commented Apr 15, 2025

What does that build flag even mean in real life/production code purpose? Why does a no-thds libperl.dll, either

* started by the official stub perl.exe

* loaded into a random other Win32 process by a random private citizen embedder user

We supply those options, just as we do on non-Win32.

If we're not going to support them, we can remove them, if we leave them in it's reasonable to fix them.

Thought up a theoretical production use of no-thd libperl with CPerlHost on. 2 different publisher single threaded local GUI root processes, GTK or C89 from 89 User32.dll, WX, QT local GUI app. A 3rd party plugin, which itself is a .dll/.so has to load and unload a libperl.so/.dll in and out of address space on command from the root process, and there is no malloc() C symbol or glibc.so linked into that root process. The 3rd party plugin must use g_malloc() or ngx_pnalloc() https://metacpan.org/release/ZZZ/Nginx-Perl-1.8.1.10/source/src/http/modules/perl/Nginx.xs#L58 and therefore libperl has to.

Now how does 1 no-thd libperl.so.dll binary made by 1 dev, execute in 2 other root procs, different authors, one QT and one GTK? and have a "clean unload" of that libperl and its XS modules? CPerlHost/USE_IMP_SYS/Win32 psuedo-fork's "clean leak free exit" hooks to the rescue in a no-thd libperl build config. "clean unload" production code example https://learn.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-cofreeunusedlibrariesex

I'm not saying we should put huge amounts of effort into fixing them but the actual fix here has been reasonably small and obvious.

Same. The infrastructure is better left in as a semi history lesson than forgotten about in git blame and there is probably isn't that much of it, since ithreads on linux/winos and psuedo-fork are here to stay forever. Ive never seen a Linux distro ship a no-thd perl in their package manager. So ithreads is AnyOS by now. Just much faster on Linux than Windows ;-)

Atleast aslong as CPAN XS authors stick (XSUB.h) or carrot stick to Hungarian notion Perl C macro API, its easier to hook random chunks of AnyOS LibC at random points in the future. Even if WinPerl declines to implement the vtable entry (and 90% of them are jump stubs (1997-5.41.11) and just falls back to whatever PosixPerl or win32_foo() in win32.c does, atleast the interp core .c code is ready for those libc hooks when they suddenly become needed, vs grep hunt-and-peck and OS/CC vendor conflicts for p5p and 3rd party header conflicts on cpan xs because ever large popular C/C++ framework's commit was helloworld.h, and that file starts with string #define MAX(a,b) ((a) > (b) ? (a) : (b)) ^D.

PosixPerl's my_libcfnc() hooks aren't that source code obvious vs mixed case letters of CPH macros.

@bulk88
Copy link
Contributor

bulk88 commented Apr 15, 2025

https://github.com/unbit/uwsgi/blob/f57eed92ee4024eb350c28239259432b93625171/plugins/psgi/psgi_loader.c#L371

DIY-ed CPerlHost, some XS devs never heard of USE_IMP_SYS/CPerlHost and reimplimented it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants