Feat/macho symbol provider#89
Conversation
Unify BinaryInfo xmetadata and provider contracts across PE and ELF: - Share ELF relocation import parsing between ElfSymbolProvider and ElfApiResolver. - Merge ELF symtab/dynamic/export symbols in getSymbols(); reset ElfApiResolver cache. - Normalize ELF OEP to base-relative offset for candidate seeding. - Route PE imports/exports/symbols/OEP through base_addr; add collectSymbols(). - Delegate WinApiResolver PE imports to PeSymbolProvider.parseImports(). Validation: make lint, make test (285 passed)
- parseImports uses _resolve_base_addr (imagebase fallback, None-safe). - Guard ELF getOep when base_addr is unset. - Update PE import default test to expect imagebase fallback. Validation: make lint, make test (285 passed)
…tadata parity Finish Tier-2 label backlog deferred from PR #84: align RustSymbolProvider PE paths with active base_addr, add MachoSymbolProvider.collectSymbols wired through BinaryInfo.getSymbols(), extract shared import parsers, and document the xmetadata address contract across PE/ELF/Mach-O. Validation: make lint, make test (293 passed)
- Treat falsy base_addr as unset (BinaryInfo defaults to 0). - Hoist PE library name lowercasing; guard None library/binding names.
|
Important Review skippedAuto reviews are disabled on this repository. To trigger a review, include ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors and normalizes binary metadata parsing (exports, imports, symbols, OEP, and sections) across PE, ELF, and Mach-O formats using LIEF, extracting shared import-table parsers into a new import_parsers.py module and updating providers to use consistent virtual address conventions. Feedback on these changes suggests importing and calling parse_pe_imports directly in WinApiResolver.py to avoid unnecessary class instantiation, and adding defensive checks in import_parsers.py to prevent potential AttributeError crashes when parsing malformed ELF binaries.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| from smda.common.labelprovider.OrdinalHelper import OrdinalHelper # noqa: E402 | ||
|
|
||
| from .AbstractLabelProvider import AbstractLabelProvider # noqa: E402 | ||
| from .PeSymbolProvider import PeSymbolProvider # noqa: E402 |
There was a problem hiding this comment.
Instead of importing PeSymbolProvider only to instantiate it for calling parseImports, we can import parse_pe_imports directly from .import_parsers. This is cleaner and avoids unnecessary class instantiation.
| from .PeSymbolProvider import PeSymbolProvider # noqa: E402 | |
| from .import_parsers import parse_pe_imports # noqa: E402 |
| imported_library.name.lower(), | ||
| ordinal_name, | ||
| ) | ||
| self._api_map["lief"] = PeSymbolProvider(None).parseImports(lief_binary, binary_info.base_addr) |
There was a problem hiding this comment.
Directly call parse_pe_imports instead of instantiating PeSymbolProvider(None) to call parseImports. This improves efficiency and maintainability.
| self._api_map["lief"] = PeSymbolProvider(None).parseImports(lief_binary, binary_info.base_addr) | |
| self._api_map["lief"] = parse_pe_imports(lief_binary, binary_info.base_addr) |
| if symbol.has_version and symbol.symbol_version.has_auxiliary_version: | ||
| lib = symbol.symbol_version.symbol_version_auxiliary.name |
There was a problem hiding this comment.
Add defensive checks to ensure symbol.symbol_version and symbol.symbol_version.symbol_version_auxiliary are not None before accessing their attributes. This prevents potential AttributeError crashes when parsing malformed or unusual ELF binaries.
| if symbol.has_version and symbol.symbol_version.has_auxiliary_version: | |
| lib = symbol.symbol_version.symbol_version_auxiliary.name | |
| if symbol.has_version and symbol.symbol_version and symbol.symbol_version.has_auxiliary_version: | |
| aux = symbol.symbol_version.symbol_version_auxiliary | |
| lib = aux.name if aux else None |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7af4693f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.oep = lief_result.optional_header.addressof_entrypoint | ||
| elif lief_type == "ELF": | ||
| self.oep = lief_result.header.entrypoint | ||
| self.oep = lief_result.header.entrypoint - (self.base_addr or 0) |
There was a problem hiding this comment.
Preserve PIE ELF entrypoint offsets
When analyzing an ET_DYN/PIE ELF buffer with a nonzero binary_info.base_addr, LIEF reports header.entrypoint as an offset relative to the load base, not an absolute VA. This subtraction makes binary_info.oep negative, and RecursiveDisassembler later queues base_addr + oep, so it starts at the small file VA (for example 0x1050) instead of base_addr + 0x1050 and can miss the real entry point.
Useful? React with 👍 / 👎.
No description provided.