-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCRE2: optimize memory allocations #15395
base: master
Are you sure you want to change the base?
PCRE2: optimize memory allocations #15395
Conversation
Damn, that was promising, and it's working well on Linux, but both Darwin and Windows completely broke 😭 Windows could be explained by using |
WTH: I can't reproduce the errors in my Windows VM (MSVC) 😕 |
I can reproduce when running the compiler specs. Looking at the memory graph, the committed memory keeps growing rapidly, and it starts failing exactly when reaching 32GB (the VM has 16GB of RAM). That can't be a coincidence. When running the compiler specs from master, the committed memory is rather stable or grows slowly (around 3.5GB). Let's note that a linker command still fails, but I get a bunch of weird spec errors in the VM. Maybe |
Some targets don't support the ThreadLocal annotation (for example OpenBSD, Android)
7ceca47
to
7487cb5
Compare
Here's a new attempt. From CI runs in my fork, the darwin and win32-msvc targets got fixed, but win32-gnu is still broken on dual heredocs such as Note: I believe FLS should work on Windows, which would avoid the custom destructors. The stdlib specs pass, save for these failures in the compiler specs. |
Yeah that's the LLVM 20 failure, ignore it for the moment |
I tried again, and FLS is apparently working on Windows. Nice. |
Thread.threads.delete(self) | ||
Fiber.inactive(fiber) | ||
detach { system_close } | ||
end | ||
end | ||
|
||
protected def run_destructors : Nil | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: cleanup before merge (not needed anymore).
We noticed in #15088 that we don't need
Crystal::ThreadLocalValue
in the Regex PCRE2 engine.We can reuse the JIT stack but also the match data for every Regex (no need for a specific match data per Regex). We can allocate one and make sure it's not used by other threads... hence the thread locals: no more spinlock (thread contention) nor hash.
It's simpler and faster. Here's the benchmark from #13144 for example:
Enabling MT also no longer has any impact on performance:
The drawback is that we must allocate each matchdata with a maximum number of ovectors (65535). That might increase memory usage, though I failed to notice it in practice. Maybe not allocating memory for every regular expression is helping?
Note: this PR will be separated into a couple PRs to introduce
Crystal::System::ThreadLocal(T)
. The point of this new type is for this patch, so I want approval on the overall approach before the split.I could have used
@[ThreadLocal]
but some targets don't support it (namely: Android, MinGW and OpenBSD) and we can't register destructors either (on thread shutdown). But usingpthread_key_create
orFlsAlloc
we can 😍