-
Notifications
You must be signed in to change notification settings - Fork 901
Protect patching with an open handle #1264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Protect patching with an open handle #1264
Conversation
…as the originally discovered lib in maps
Signed-off-by: Bara' Hasheesh <[email protected]>
# Conflicts: # src/symbols_linux.cpp
Tested that after this PR the test in asprof-reproducers does not fail anymore, and fails in |
Also tested the crash in the stress test by @Baraa-Hasheesh, and could not reproduce the issue anymore after this PR |
Co-authored-by: Bara' Hasheesh <[email protected]> Signed-off-by: Francesco Andreuzzi <[email protected]>
Let's add minimal reproducers as cpp unit tests. |
@krk Adapted test from @Baraa-Hasheesh's repo |
6877914
to
5cff23d
Compare
a29f8ab
to
28d9f78
Compare
28d9f78
to
0eacfab
Compare
cac4c90
to
b310dba
Compare
All comments addressed @apangin |
struct link_map* map; | ||
// validate that the current loaded library is the same library that was observed during the /proc/self/maps processing | ||
if (handle != NULL && dlinfo(handle, RTLD_DI_LINKMAP, &map) == 0) { | ||
return cc->imageBase() == (const char*)map->l_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to shared objects whose virtual load address is non-zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I think @Baraa-Hasheesh looked into this as part of #1261. Can you weigh in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin yes.
Is there a case where the virtual load address is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested in a situation where a shared object has PT_LOAD
program header with p_offset == 0
and p_vaddr != 0
. Can you provide me with an evidence that this case is covered or that it never happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin I will investigate this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we check if this situation applies, and if it does we simply skip the check and assume the handle is valid? The situation highlighted by @Baraa-Hasheesh seems to be an edge case, and I think it's still valuable to have this check until we find a proper fix.
Having this will be give us more protection than what we currently have, and won't make things worse than they are now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dlinfo protection is only really needed if library is re-opened using a different thread after close
No, the problem can happen even in a single-threaded cases, for instance, if we attempt to do native memory profiling long after a library was first parsed.
I do think we need some level of protection against this situation, even if it is not 100% accurate. If comparing l_addr
does not work, maybe we can check l_ld
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin technically that's true but it will require that the library is reloaded into a different location in memory (could happen in JIT from what I'm reading)
But this raises the concern that the library will not be patched (skipped) or loose it's patching during these long profile sessions (profiling accuracy degrades with time) while being a valid behavior of the profiled process (Different issue that can be tracked separately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we drop the dlinfo & switch to dladdr
https://man7.org/linux/man-pages/man3/dladdr.3.html
void *dli_fbase; /* Base address at which shared object is loaded */
We can use the _imports
inside the CodeCache
to find an address to be used in the dladdr
call
I.E. when we save imports during symbol parsing we set a pointer to the first symbol we find (Avoid double looping)
We use that pointer to call dladdr
Dl_info dl_info;
if (dladdr(entry, &dl_info)) {
// Check that dl_info. dli_fbase == image_base
}
This solution should be also applicable to MacOs
Note: I initially didn't consider dladdr
as you need an address in the lib to call it & missed the fact we already have the address inside _imports
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just note that this is only valid on the patching side, while the mentioned case here will still be possible if lib was relocated after it was detected but before it was parsed
I think it's best if we have the dladdr side on the patching side as it should also protect MacOs
For the parsing side, for now I think it should be an acceptable risk & we can create a different thread to track & reproduce the possible crash
src/codeCache.h
Outdated
private: | ||
char* _name; | ||
// E.g. remove (deleted) so that removed library can be reopened | ||
char* _clean_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be stored here - it can be materialized where needed.
Having two name fields is confusing. Furthermore, " (deleted)" is Linux-specific thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be materialized when needed, but then somebody has to free it. If we return a simple char*
that's going to make things more complicated. How about returning std::string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it's better for CodeCache not to deal with cleanName()
. Having both name()
and cleanName()
in the same interface is not really clean (pun intended).
AFAICS, there is only one place in Linux-specific code where you need a name stripped off "(deleted)" suffix, when re-opening a library in UnloadProtection. Why not process the name there? It can be even stored in a stack allocated buffer, so no free is required. Though dynamic allocation is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(deleted)" is Linux-specific thing.
This shouldn't be part of CodeCache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be helpful to let users know that the orignal file has been deleted and therefore symbol infromation can be incomplete.
struct link_map* map; | ||
// validate that the current loaded library is the same library that was observed during the /proc/self/maps processing | ||
if (handle != NULL && dlinfo(handle, RTLD_DI_LINKMAP, &map) == 0) { | ||
return cc->imageBase() == (const char*)map->l_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested in a situation where a shared object has PT_LOAD
program header with p_offset == 0
and p_vaddr != 0
. Can you provide me with an evidence that this case is covered or that it never happens?
src/codeCache.h
Outdated
private: | ||
char* _name; | ||
// E.g. remove (deleted) so that removed library can be reopened | ||
char* _clean_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it's better for CodeCache not to deal with cleanName()
. Having both name()
and cleanName()
in the same interface is not really clean (pun intended).
AFAICS, there is only one place in Linux-specific code where you need a name stripped off "(deleted)" suffix, when re-opening a library in UnloadProtection. Why not process the name there? It can be even stored in a stack allocated buffer, so no free is required. Though dynamic allocation is also fine.
src/codeCache.cpp
Outdated
_name = NativeFunc::create(name, -1); | ||
|
||
// Strip " (deleted)" suffix so that removed library can be reopened | ||
size_t len = strlen(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct link_map* map; | ||
// validate that the current loaded library is the same library that was observed during the /proc/self/maps processing | ||
if (handle != NULL && dlinfo(handle, RTLD_DI_LINKMAP, &map) == 0) { | ||
return cc->imageBase() == (const char*)map->l_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dlinfo protection is only really needed if library is re-opened using a different thread after close
No, the problem can happen even in a single-threaded cases, for instance, if we attempt to do native memory profiling long after a library was first parsed.
I do think we need some level of protection against this situation, even if it is not 100% accurate. If comparing l_addr
does not work, maybe we can check l_ld
instead.
src/symbols_linux.cpp
Outdated
return; | ||
} | ||
|
||
char* nameCopy = strdup(cc->name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variables follow snake_case
naming convention.
Can we avoid strdup
if there is no deleted
suffix? Alternatively, we can use a stack-allocated buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/symbols_linux.cpp
Outdated
lib.file[len - 10] = 0; | ||
} | ||
CodeCache* cc = new CodeCache(lib.file, array->count(), false, lib.map_start, lib.map_end, lib.image_base); | ||
stripDeletedSuffix(lib.file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the original file has been deleted, I don't think we should attempt to parse a file with the similar name, because it can be a different file with different symbol layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand, this seems to be replicating the original logic. Do you propose to skip parsing the library if the original file was deleted?
Description
In this PR I introduce additional protection during patches. The main change is a new function
isSafeToPatch
, which combines the checks introduced in #1261, #1263, #1243, and can be used in other parts of the code likeMallocTracer
.Thanks @Baraa-Hasheesh for the discussion and feedback on the code.
Related issues
#1250, #1256
Motivation and context
Keeping an open handle makes sure we the library cannot be unloaded while the patch is being made, thus providing additional protection against the failures in the linked issues.
How has this been tested?
make test
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.