Skip to content

Conversation

@patrickdalla
Copy link
Collaborator

@patrickdalla patrickdalla commented Nov 14, 2023

Closes #1978
CertificateParser was refactored to extract certificates as subitems. So it can be used in conjunction with tika PKCS7Parser.

Tika PKCS7Parser ignores any certificate information, extracting only the content of the signed PKCS7 file to be parser. So, CertificateParser would be responsible for internal certificates information extraction when configured in conjunction with Tika PKCS7Parser.

To test:
1)User certificates can be exported as p7b files from Windows with full certificate chain. These kind of files will have each certificate extracted as subitem. Though PKCS7Parser will throw an exception as it does not have any content. This exception will be registered in metadata X-TIKA:EXCEPTION:embedded_exception.

  1. Real signed files formated as PKCS7 have the certificates and its corresponding signed content. These kind of files will have its certificates used to sign parsed as subitems, and the content parsed by Tika PKCS7Parser.

@lfcnassif lfcnassif mentioned this pull request Nov 16, 2023
@patrickdalla
Copy link
Collaborator Author

For testing:
ARQUIVOS_PROCESSO_202311161041346480.zip

patrickdalla and others added 20 commits November 17, 2023 08:52
certificates with full certification path and no CMS signed data.
@aberenguel
Copy link
Collaborator

After the changes the result with my sample files are:

image

In comparison with commit 356689f (prior the last improvements), the results were:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review CertificateParser to support new tika "x-x509-cert" contentType.

3 participants