Skip to content

Conversation

@FredCoast
Copy link

Creating the functionality for using ELF files, integrating them into the original application

Copy link
Owner

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is such a great start! i'm really excited by the functionality that you already added.

i think the first order of business is to add tests for some example ELF files. would collect one or two representative ELF files (maybe one shared object and one executable) and add tests for each of the new routines (as reasonable)? you can use the tests for the PE loader as an example.

thoughts?

@williballenthin
Copy link
Owner

also we'll have to update all the documentation to remove the references to only being for PE files 🎉

@FredCoast
Copy link
Author

Yeah sounds good and thanks for looking over it!

I'll do the tests and docs as soon as I can, I've got a busy week coming up so I'll try and fit it around them!

FredCoast and others added 2 commits September 26, 2025 14:10
nop_elf - /bin/true - equivalent of a nop.exe
ls - /bin/ls - equivalent of K32

Both gotten from a x64 linux mint distro
…on, and added tests (replaced ls with libc.so.6)
@FredCoast
Copy link
Author

I've updated it to add tests, sorry it took me so long I was moving into my new university so I've been a bit busy

I have tried to keep the tests used as similar as the ones that you originally used, I mainly just got them from linux mint if that is alright?

Again sorry it took so long

@williballenthin
Copy link
Owner

looking good!

the loader tests look reasonable.

next it might make sense to add some tests around code discovery. like take known functions from IDA/Ghidra and see if lancelot has found them (and triage if not). it's ok to add a test case for every function we expect to find, and even those we know don't work yet, and slowly we can increase our test passing rate. it might take quite a bit of time to get the ELF code analysis working well - i'm not sure yet

@FredCoast
Copy link
Author

Sounds good, so to double check create more functionality and tests like the ones in:

  • export.rs
  • entrypoints.rs
  • control_flow_guard.rs
  • runtime_functions.rs
  • noret_imports.rs
    (but relative to elf files)
    until the results are similar to those in IDA/Ghidra

I don't mind working on it over the long run, I'm learning lots already so its been very useful, thanks!

@williballenthin
Copy link
Owner

Control Flow Guard and Runtime Functions are Windows-centric structures, so they won't be relevant for ELF files. but perhaps there are other Linux or generic ELF structures we could use too.

it's fine to start with exactly whatever you need for QS. otherwise you risk getting distracted. but as long as you're having fun and learning, it doesn't matter!

i'd probably start with entry points and exports, and then do an hour or two of investigation of what other techniques there are to find functions. that should be a solid start.

@FredCoast
Copy link
Author

I was just using them as an example, like how the tests might look/where they are kept,

I'm currently working on implementing undefined thunks through the procedure linkage table, as I realised that I was hardly picking up any functions and this seems to be where most of that fall off is (at least based on the tests I've been using).

And with QS I would rather wait for Moritz to reply before doing too much in case I spend time on something that isn't relevant by accident

@FredCoast
Copy link
Author

I believe I've added PLT, entry points and exports but I'm struggling to get it to pick up some of the functions that Ghidra has identified, I've tried to research potential methods for function detection with most resources saying heuristics and cfg being the main methods.

So I have attempted to use your implementations for those methods, as I am assuming that as we are working with x86-64 the methods would work for both. However I don't seem to be finding the functions so I was wondering if you had any pointers?

Also should the exports pick up global symbols and functions or just functions as Ghidra groups both from what I can tell?

@williballenthin
Copy link
Owner

this is probably going to be an area of ongoing research. i spent a lot of time (probably weeks) trying various things to get fairly good coverage on PE files. lots of comparing what IDA/Ghidra found and figuring out how that could be recovered.

some things to look into:

  • scan for call instructions and collect their targets (heuristic, call_targets.rs, probably should work for ELF files, too)
  • scan for known function prologs. use the Ghidra database as inspiration. (heuristic)
  • scan for pointers to things that look like code in code sections (heuristic)
  • maybe parse exception handler tables, but this could also be a lot of work, not sure

another place you can look is:

since my perspective on binary analysis is fairly heavily influenced by vivisect.

otherwise, getting good code coverage is only as important as you need it for 1) learning/satisfaction and 2) what QS needs. We can always add new algorithms as we have the ideas.

@FredCoast
Copy link
Author

Thank you!
I'll do more research before attempting coding anything else then, after some time would I be able to send you what I've found, just to double check if you've heard of it and consequently if it's something worth implementing?

And thanks for suggesting looking at vivisect.

And to be completely honest I have no idea what qs needs as I was just looking at open issues. I have implemented the sections for elf, I was just trying to make it as well rounded by implementing the offsets similar to pe files. But I didn't want to bother Moritz at the moment with the CTF going on.

@williballenthin
Copy link
Owner

I think today QS only uses Lancelot to figure out where opcodes are, so that it can filter out ASCII strings that are actually instructions.

in the future there's a possibility of maybe showing functions and their references to strings (or maybe references from string to functions, not sure) but we don't have this today. Lancelot would be a good fit for this.

@williballenthin
Copy link
Owner

And, of course, you can always ping me to brainstorm and/or check ideas.

Incidentally, we can merge this PR whenever you feel it's a good time.

@FredCoast
Copy link
Author

Yeah I think that is the use case,

that would be pretty cool, sort of like instead of showing strings by sections instead show them by function?

I'm happy for it to be merged now, but at the end of the day its your project so I'm happy with whatever you decide

@FredCoast
Copy link
Author

After digging through Ghidra I have discovered the FDE table, from what I can tell it contains most executable functions, unfortunately it wasn't mentioned in the resources I was using, have you heard of it before?

I also wanted to ask about the functions that look like this when decompiled:
image
I'm just double checking that they are basically one function calling another and is put in by Ghidra. As a result it isn't something that I should worry about detecting?

@FredCoast
Copy link
Author

FDE parsing worked well, it picked up all the functions in my nop that I was missing.

But I'm picking up lots of subroutines which ghidra omits because of "Bad instructions". I'll try and work on filtering them out and then push what I've done!

@FredCoast
Copy link
Author

FredCoast commented Oct 14, 2025

Done!

your is_probably_code was a great solution, so thanks for that! There are still a couple of items which I can't figure out how to filter because Ghidra says nothing whatsoever about them.

I also had to use is_probably_code on the CFG analysis in the ELF workspace as adding the FDE caused it to pick up loads of subroutines. Which were some of the bad instructions, so I'm just double checking if that is alright, or if it was something you saw when looking at PE files

@williballenthin
Copy link
Owner

Awesome!

I think what you have now is so much better than what we had before (nothing!).

I would not have expected metadata like the FDE records to point to invalid data, so I wonder if you're interpreting them slightly wrong. I would prefer to use is_probably_code only when we're literally guessing (like by scanning unexplored bytes for things that could possibly be code); however, again, what you have now is still a really great start.

Can you share some specific examples of where the FDE records result in weird subroutines and/or bad instructions? I'd be happy to look at these also and try to figure out whats going on.

@FredCoast
Copy link
Author

FredCoast commented Oct 15, 2025

Yeah I just had Ghidra analyse it again and loads more functions appeared, I must have used the wrong language or unticked a box because half the file was ?? before. Sorry about that.

Now looking at it the extra entries are Thunk functions which are the same as imports at different addresses. Well thanks for letting me know that the results were unusual.

@williballenthin
Copy link
Owner

Now looking at it the extra entries are Thunk functions which are the same as imports at different addresses. Well thanks for letting me know that the results were unusual.

ah, that does make sense, so maybe with a little special handling for thunks the analysis can be deterministic? that's exciting!

@FredCoast
Copy link
Author

FredCoast commented Oct 15, 2025

I'm sorry I'm struggling to understand what you mean by that.

Are u saying to keep the output consistent by filtering out the thunks?

And should I not show the imported functions in general?

@williballenthin
Copy link
Owner

Sorry that I was unclear, let me rephrase.

I understood that the FDE entries sometimes pointed to thunks (or perhaps raw function pointers? not sure) and that when lancelot tried to disassemble them, they failed with invalid instruction errors. But, noticing that the FDEs sometimes pointed to thunks and handling those specially could avoid the errors.

Now, perhaps I had this wrong, and if so, sorry!

Is there an outstanding issue? or does lancelot now handle all the FDE entries well?

Again, happy to help if you have some specific examples that I can dig into, if there are outstanding issues.

@williballenthin
Copy link
Owner

Are u saying to keep the output consistent by filtering out the thunks?

we should keep them around, and even mark them as code and functions (like we do in PE), even if they appear to be trivial (perhaps just jmp $elsewhere).

And should I not show the imported functions in general?

again, i think we should try to capture the fact we have these references to imports (either as thunks or GOT entries or whatever). if the existing thunks infrastructure isn't enough, at least as PE uses it, feel free to introduce a new concept that works better for ELF (i'm wondering if GOT entries will work or not, i'm not sure).

@FredCoast
Copy link
Author

Yeah of course

I noticed you had a lot of them commented out did you find that a lot of them resulted in false positives?

@williballenthin
Copy link
Owner

williballenthin commented Oct 23, 2025

did you find that a lot of them resulted in false positives

Moreso I used my personal opinion on whether they made much sense. And probably some had FPs. I'm not super happy about relying on personal experience to dictate what was included, but I figured it was reasonable to get some quick wins and better coverage.

@FredCoast
Copy link
Author

Ah makes sense, I'll leave them uncommented for now as admittedly I don't know. I've updated it for the gcc patterns, it's resulted in many more being picked up so thanks for the suggestion!

Out of curiosity why shouldn't personal experience be used for something like this?

@williballenthin
Copy link
Owner

williballenthin commented Oct 23, 2025

Out of curiosity why shouldn't personal experience be used for something like this?

I think it can be, but when it is used, then it should be heavily documented.

Relying on personal experience is subject to a lot of bias, based on what the person has seen before. @williballenthin has really only reversed a lot of Windows binaries, so I really shouldn't be guessing at what's in most ELF files (or maybe not even in Windows drivers? or PE files compiled by Cygwin? the list is endless)

I'd much rather be able to say: "we ran this test script against 100k binaries collected in October, 2025 and derived these patterns". That way, in 10 years, it's reasonable for the maintainers to decide "oh hey we should re-run that experiment and update the patterns". But if the code relies on the experience of a person at a specific point in time, how is any subsequent maintainer able to decide if the code should be changed?

Of course, all software is created based on peoples experience (but lets not get too meta and philosophical here). But I think its important to call out when we're using experience, and to make it clear when its ok to change or update the logic (and how it can be re-tested/validated). Like, "I found that PUSH PUSH POP wasn't very common in compilers today but that could change".

Admittedly, the code doesn't do this today, but hopefully that explains why I'd shy away from just relying on experience.

Finally, code like we have in Lancelot is essentially a database of things we've noticed and encode in ...code. So ideally we try to explain how we determined a pattern or algorithm worked, so that it could be redone or adjusted again in the future, rather than just an implementation of the algorithm.

@FredCoast
Copy link
Author

Ah so the implementations used now are subject to the bias of current trends and usage, however this can become quickly outdated when being considered in the future, and facts make long term development more sustainable.

Thanks for explaining it

@FredCoast
Copy link
Author

I've added symtab, dynsym and DWARF detection,

I had some reservations about adding them in general as they will most likely just get stripped from the file but oh well

Ill have a look at ghidra and see what else they have that I can implement soon

@FredCoast
Copy link
Author

FredCoast commented Nov 6, 2025

I have found this paper:
Function boundary detection in stripped binaries : https://dl.acm.org/doi/abs/10.1145/3359789.3359825

It mentions "Terminal Function Call Chain Detection" this is something I will have a look at next

The paper also has other methods which I will look at after I do this,

I have a lot of assignments at the moment so I will try and fit it in when I can, I also wish i had found this paper sooner

@williballenthin
Copy link
Owner

nice! i found a copy as well and will read it tonight

@FredCoast
Copy link
Author

This one is also interesting:
https://ieeexplore.ieee.org/abstract/document/8552415

More about determining function ends, which would be useful for if a string by function mode was added to qs,

but it also does cover "far away" function starts from what I can tell, but it also relies heavily on heuristics

@williballenthin
Copy link
Owner

i'd previous read and tried to implement Nucleus, referenced here, in this also interesting article: https://binary.ninja/2017/11/06/architecture-agnostic-function-detection-in-binaries.html

notably Vector35 gives some good references to test sets that we could also use to evaluate this work

@FredCoast
Copy link
Author

Yeah I saw that when I was looking about, it seems very interesting,

thanks for the recommendation, how come you didn't fully implement it before?

I found the dataset and also this other one too:
https://mailuc-my.sharepoint.com/personal/wang2ba_ucmail_uc_edu/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fwang2ba%5Fucmail%5Fuc%5Fedu%2FDocuments%2Fgroup%2Fdatasets%5Fpublic%2Fsmart%5Fsp24%5Frustbound&ga=1

@williballenthin
Copy link
Owner

williballenthin commented Nov 7, 2025

do you mind sharing the document as an upload here? i don't have access to the o365 drive. edit: maybe that drive has the dataset itself. if it's important i can request access, let me know.

i did the nucleus algorithm early on, and it did find good code, though i recall not having good handling for jump tables was causing trouble. so i "went back to basics" and used more conservative algorithms. now's probably a great time to reassess and perhaps try again (and document our lessons learned explicitly!).

@FredCoast
Copy link
Author

It was like 18gb of test files, unfortunately I have also lost access it seems, nothing that I am too fussed about to be honest.

Cool I will do the traceback and then nucleus (with explicit documentation)

@FredCoast
Copy link
Author

FredCoast commented Nov 10, 2025

I ran the current implementation against some tests from here:
https://github.com/CenterForSecureAndDependableSystems/FunctionBoundary

and these are the results
https://github.com/FredCoast/FunctionBoundary---Lancelot/blob/master/scripts/results.csv

Its finding lots of results correctly but also
I'm getting lots of false positives, probably something to do with heuristics

I have had to modify the cfg slightly to get the tests to work if that is alright

Some of the F1 scores are completely messed up because theirs compares the size of functions and I wasn't testing against that

@FredCoast
Copy link
Author

Sorry I've been super busy with coursework and applying for jobs, I will try and do more in the upcoming weeks, especially when term ends

@williballenthin
Copy link
Owner

take your time :-) work on this when you want to and when it brings you joy, no expectations or pressure from me

@FredCoast
Copy link
Author

Finally, now term has ended I've managed to find time to work on this.

I've attempted several strategies to improve the accuracy including:

  1. finding functions in between functions
  2. using unconditional JMP's
    But neither worked very well, so I've removed or commented them out for now

However, I realised where many the false positives were coming from, it turns out I was clumsily checking non executable areas for functions. I've now fixed this and in testing the results I've found to be very positive with the f1 and recall going up a substantial amount. I have the results in a spreadsheet if you are interested

I'm going to continue working on trying to identify where more false positives are coming from, as I believe it is still an issue from what I can tell.

@FredCoast
Copy link
Author

Turns out I wasn't parsing the FDE table properly as I wasn't applying some relocation's correctly

Admittedly this was quite heavily vibe coded as I couldn't figure out why my implementation wasn't working properly

But based on those tests I was doing there is now like a 97% accuracy (which is making me think that paper which included them is somewhat bogus and in reality a glorified FDE parser)

However, based on those tests I am missing exactly 3 functions from each binary so I'll have a look into what they are (I'm guessing its going to be the DT_FINI etc..)

@FredCoast
Copy link
Author

I'm so sorry I am probably going to have to stop working on this for a while (at least until May),

I am starting an internship, and I don't think I will have the time to fit this in.

I've really enjoyed working on this, learning about Rust and about development in general.

And thank you for the advice that you have given me over the past couple of months, I hope I find time to keep working on this and sorry to leave it in such an unfinished state.

@williballenthin
Copy link
Owner

Woohoo, congratulations @FredCoast!

I appreciate the research you've done and notes you've consistently shared here.

Before you go, do you think the PR is in a reasonable state to merge, at least with "experimental"-level support?

@FredCoast
Copy link
Author

Thank you!

I believe it is in a reasonable state, there is definitely work to be done, but from my tests it is fairly accurate (It beat IDA Free in one of them). But I would probably give it the label of experimental.

If you decide to merge it and any issues with it come up please don't hesitate let me know and I will try to sort them.

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants