-
Notifications
You must be signed in to change notification settings - Fork 48
Merges function sections #6579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kaoudis
wants to merge
112
commits into
master
Choose a base branch
from
kaoudis/merge-function-sections
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Merges function sections #6579
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…taint AND things break
…f, so that we can reuse or get rid of some of the duplicate tdag sections, and get rid of the separate json file for functions
…ng on here with dep ordering
…ated into elsewhere
…g(function_id, offset), strings[offset] = function_name
…string table combo
…ping -> string table combo" This reverts commit 15c596c.
… (leaving the JSON code in place in parallel for now since this is a breaking change)
…ed function symbol into the functions section
…he strings table?
…hat should be removed before merge
… that are too long; todo come back to this; started also debugging why the stdin test is hanging
… works; tests pass except unrelated
…ort to protect dfsan from terminating appropriately - not quite sure yet what to do instead but this isn't doing anything but making the test hang
… taint tracking pass, to see if changing stuff in only one place for simplicity of debugging is now possible...
…h one will break? rouletteeee
… if its literally just element extraction for gep that is borked
…ctual instrumentation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
instrumentation
related to the LLVM instrumentation or dfsan integration
maintenance
Tasks that are not critical but would improve the overall maturity of PolyTracker
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What did this change break?
I believe I've added enough tests / worked through enough failing tests that the answer is "nothing that impedes the expected functionality of PolyTracker". All C++ and Python tests on this branch pass, and the Dockerfiles that built at the time of the last paper eval all build / seem to work as anticipated. 😅
What is this change doing?
This is a cleaner attempt at #6566 that preserves the cflog functionality so I / we can use it, instead of basically getting rid of it. My goal is for taint tracking to work exactly as before, but to clean up the ftrace/cflog/events side of the house, unifying
--cflogand--ftraceoptions (cleaning up / simplifying how we are writing to the Functions, Events, Control Flow Log, and String Table sections overall) so we don't add duplicate instrumentation to software or write duplicate data to the TDAG and/or separate files (i.e., functionid.json) anymore.Everything that I could build got run on example inputs to make sure it worked as expected. As a part of these changes we don't write to functionid.json anymore and just use the space we were allocating and not filling in in the tdag, since it's a humongous region we don't use all of anyway. TDAG size is fixed, but our usage of it is slightly more efficient currently. A future goal could be to only mmap the space we need so file size can be smaller.
Instrumentation Time and Resulting Bitcode Sizes
These experiments reproduce the measurements from the
PolyTracker paper,
but on different hardware. For uniformity, experiments were all conducted in an Ubuntu 24.04 cloud VM with
I'm comparing the before-and-after of the TDAG condensation changes on
kaoudis/merge-function-sectionswithmasterate618c4d6d7481326d0ea76073d663d2b867e0e9d, the hash of the work included in the camera ready version of the prior paper. The question I'm answering here is "what is the net result of these changes in terms of how the software works".All the current example Dockerfiles on
masterthat work right now (we/I need to clean up the others a bit; they're a bit bitrotted) are included here for completeness. The following measurements aren't terribly scientific, they are from one run of the Dockerfile each (whereas for the paper I averaged ten runs apiece).Bitcode sizes
The "in" .bc file is the whole-program .bc file that gets the first layer of instrumentation applied to it. The CFlog .bc is the "in" .bc with CFlog instrumentation, pre-optimization (if optimization occurs in the PolyTracker build). the final .bc file is the instrumented .bc file ending in
.instrumented.bcthat we lower to an executable. bc size may have changed because what instrumentation we use changed: I removed the separate function name recording / events pass-level code, and added function name recording to the tdag into the cflog pass. I also removed the separate--ftraceand--taintoptions: we do--taintby default, and--ftraceis part of--cflognow.Also note that some dockerfiles did not compile on the
masterbranch prior to these changes with the--cflogoption and I'm not sure why, but because of this I did not record cflog-inclusive bc size for them onmaster.As measured by
ls -lbin the container, and normalized into MiB:pdftopspdftotextpdfinfopdftopspdftotextTDAG sizes
TDAG size is fixed because of how we write TDAGs right now; it didn't change.
Total instrumentation time
"Instrumentation time" here refers either to the time Docker takes to run
polytracker instrument-targets, which includes how long it takes to do both cflog and taint label instrumentation placement as well as executable creation, or the time to do equivalent steps.Also note that some dockerfiles did not compile on the
masterbranch prior to these changes with the--cflogoption and I'm not sure why, but because of this I did not record cflog-inclusive instrumentation time for them onmaster.As measured by Docker:
pdftopspdftotextpdfinfopdftopspdftotextWhat's weird here
The sizes of bitcode when instrumented with all our passes before AND after these changes seem like they could be indicative of extra instrumentation (perhaps the labels pass instrumenting the cflog and/or functions pass?), though I haven't dug into whether this is truly happening yet. It doesn't seem like this is exactly hurting anything at the moment, but I would be curious if others notice the same.
Notes
*
I combined the times recorded by Docker for extraction, linking, instrumentation, optimization (if included), and lowering to get this figure since that's everything
instrument-targetswould do.NB Dockerfile-acropalypse.demo does not run the typical bitcode optimization step as part of instrumentation and lowering.
N/As
The following Dockerfiles did not build on master or on the new branch. Here's minimal notes on why. These should be investigated later if we care to keep them up to date.
DaeDaLus NITF
DaeDaLus NITF parser fails on the Cabal build of DaeDaLus, and also did at the time I did the prior eval work for the paper. I think this is because the DaeDaLus repository main branch is broken, and we need to pin a prior working commit in that Dockerfile. I don't know what this commit would be - the one mentioned in the Dockerfile doesn't build, either.
jq
Linking failed for build defined in Dockerfile. Also to investigate later; was not included in paper.
libgen
go getis no longer a supported command outside a module, and the Go setup in this Dockerfile would need to be updated.listgen
After solving a couple minor errors due to zlib URL changing etc, building the libxml2-2.9.10 codebase failed with Python macro errors.
pdfium
pdfium's build halted and prompted me for the country of my keyboard. This will need to be fixed so the build is completely automated. I think I recall doing some work here that I didn't save to make the build more automated - I think a particular commit might need to be pinned in the source repo.
png
The png Dockerfile seeisms unfinished and doesn't instrument anything. I have a different version in a volume saved from a different cloud provider that I can pull out and use later.