Skip to content

[HWASan] Fix symbol indexing #135967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 31 additions & 12 deletions compiler-rt/lib/hwasan/scripts/hwasan_symbolize
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Shdr_size = 64
sh_type_offset = 4
sh_offset_offset = 24
sh_size_offset = 32
SHT_SYMTAB = 2
SHT_NOTE = 7

Nhdr_size = 12
Expand All @@ -62,8 +63,13 @@ def handle_Nhdr(mv, sh_size):
offset += Nhdr_size + align_up(n_namesz, 4) + align_up(n_descsz, 4)
return None

def handle_Shdr(mv):
def unpack_sh_type(mv):
sh_type, = struct.unpack_from('<I', buffer=mv, offset=sh_type_offset)
return sh_type

def handle_Shdr(mv):
sh_type = unpack_sh_type(mv)
# Sanity check
if sh_type != SHT_NOTE:
return None, None
sh_offset, = struct.unpack_from('<Q', buffer=mv, offset=sh_offset_offset)
Expand All @@ -76,19 +82,29 @@ def handle_elf(mv):
# have to extend the parsing code.
if mv[:6] != b'\x7fELF\x02\x01':
return None
found_symbols = False
bid = None
e_shnum, = struct.unpack_from('<H', buffer=mv, offset=e_shnum_offset)
e_shoff, = struct.unpack_from('<Q', buffer=mv, offset=e_shoff_offset)
for i in range(0, e_shnum):
start = e_shoff + i * Shdr_size
sh_offset, sh_size = handle_Shdr(mv[start: start + Shdr_size])
if sh_offset is None:
continue
note_hdr = mv[sh_offset: sh_offset + sh_size]
result = handle_Nhdr(note_hdr, sh_size)
if result is not None:
return result
sh = mv[start: start + Shdr_size]
sh_type = unpack_sh_type(sh)

if sh_type == SHT_SYMTAB:
found_symbols = True
elif sh_type == SHT_NOTE:
sh_offset, sh_size = handle_Shdr(sh)
if sh_offset is None:
continue
note_hdr = mv[sh_offset: sh_offset + sh_size]
result = handle_Nhdr(note_hdr, sh_size)
if result is not None:
bid = result

return (found_symbols, bid)

def get_buildid(filename):
def read_elf(filename):
with open(filename, "r") as fd:
if os.fstat(fd.fileno()).st_size < Ehdr_size:
return None
Expand Down Expand Up @@ -200,7 +216,7 @@ class Symbolizer:
if os.path.exists(full_path):
return full_path
if name not in self.__warnings:
print("Could not find symbols for", name, file=sys.stderr)
print("Could not find symbols for {} (Build ID: {})".format(name, buildid), file=sys.stderr)
self.__warnings.add(name)
return None

Expand Down Expand Up @@ -268,13 +284,16 @@ class Symbolizer:
for fn in fnames:
filename = os.path.join(dname, fn)
try:
bid = get_buildid(filename)
found_symbols, bid = read_elf(filename)
except FileNotFoundError:
continue
except Exception as e:
print("Failed to parse {}: {}".format(filename, e), file=sys.stderr)
continue
if bid is not None:
if found_symbols and bid is not None:
if bid in self.__index:
print("Duplicate build ID {} for {} and {}".format(bid, self.__index[bid], filename), file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to crash because we have the same file twice? Couldn't this happen through symlinks or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I agree that this is too harsh. Maybe we emit in verbose mode that two shared objects were found with the same build id at different locations?

Copy link
Contributor

@fmayer fmayer Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning with verbose is OK. preferably we would only do that if the files actually differ, but it's not a hard requirement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed hard exit but will log a warning if two files with the same build id are found.

Let me know if you want me to do a checksum to verify if the files are different and then gate that log message based comparison being not equal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like with hashlib.file_digest that should be quite easy, so I'd say let's go for it. We could also short-circuit with os.path.samefile to not waste all that effort if it's just a link.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added hashlib.file_digest check to ensure that contents of shared object with the same build id are the same. Currently using sha256.

Used os.path.samefile to check for path equality before running the checksum calculation.

sys.exit(1)
self.__index[bid] = filename

def symbolize_line(self, line):
Expand Down