Fix: plug security issue partition system files via include #3908
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
A recent security review showed that it was possible to partition arbitrary local files in cases where the filetype supports an "include" functionality that brings in the content of files external to the partitioned file. This affects
rstandorgfiles.Fix
This PR fixes the above issue by passing the parameter
sandbox=Truein all cases wherepypandoc.convert_fileis called.Note I also added the parameter to a call to this method in the ODT code. I haven't investigated whether there was a security issue with ODT files, but it seems better to use pandoc in sandbox mode given the security issues we know about.
Testing
To verify that the tests that are added with this PR find the relevant issue:
sandbox=Truetext fromunstructured/file_utils/file_conversion.pyline 17.test_unstructured.partition.test_rst.test_rst_wont_include_external_filesandtest_unstructured.partition.test_org.test_org_wont_include_external_files. Both should fail due to the partitioning containing the word "wombat", which only appears in a file external to the partitioned file.