Skip to content

Conversation

@rsaccani
Copy link

exefilter searches for the %PDF- header at the beginning of the file. Adobe allows this header to be placed within the first 1024 bytes of the file and unfortunately many legit files have some characters before %PDF-.

I replaced startswith with find in order to allow such files to be analyzed.

@decalage2
Copy link
Owner

That's interesting: indeed Adobe Reader allows this, but all the legit PDF files I've seen so far have %PDF at offset 0. I'd be curious to see legit samples with data before %PDF. Do you have some that you could share by email?

@decalage2 decalage2 self-requested a review December 13, 2021 22:53
@decalage2 decalage2 self-assigned this Dec 13, 2021
@rsaccani
Copy link
Author

That's interesting: indeed Adobe Reader allows this, but all the legit PDF files I've seen so far have %PDF at offset 0. I'd be curious to see legit samples with data before %PDF. Do you have some that you could share by email?

I've just sent a sample via email.

It happens mostly with automatically generated pdf files, it is usually because of programming mistakes that are not detected because adobe opens the files without warnings.

It happens also with major vendors, for some reason the frequency of these files increased in the last couple of months. This has not been an issue for years, it is becoming recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants