-
Notifications
You must be signed in to change notification settings - Fork 38
Prevent .tar file corruption by patching short reads #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #261 +/- ##
==========================================
+ Coverage 44.93% 45.20% +0.26%
==========================================
Files 26 26
Lines 3527 3568 +41
==========================================
+ Hits 1585 1613 +28
- Misses 1942 1955 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
57a9c5b to
5b22f2c
Compare
| if info.size is None: | ||
| shutil.copyfileobj(fh, self.tar.fileobj, bufsize) | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this block since it would be an illegal action in this context.
| for _ in range(blocks): | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time | ||
| buf = fh.read(bufsize) | ||
| if len(buf) < bufsize: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can generalize this case instead of doing it twice. Keep track of how many bytes you actually wrote (i.e. using .tell() and only pad once.
|
|
||
| blocks, remainder = divmod(info.size, bufsize) | ||
| for _ in range(blocks): | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long or short?
| self.tar.fileobj.write(buf) | ||
|
|
||
| if remainder != 0: | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long or short?
|
|
||
| info = copy.copy(info) | ||
|
|
||
| buf = info.tobuf(self.tar.format, self.tar.encoding, self.tar.errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make this even safer by truncating to the previous offset/tar member end if any exception occurs while writing.
Co-authored-by: Erik Schamper <[email protected]>
|
Any idea when this is getting fixed? It's affecting me as well. Let me know if there's anything I can do to help! |
Fixes an issue where a
.taroutput file would contain inconsistencies with regards to expected and actual file size of the included files.In some cases, a file on disk can report a size of X bytes, but at the time of actually reading X bytes from the file, less than X bytes are actually available in the file (a short read). Acquire would report these issues as
OSErrorin the resulting Acquisition log file, because the Python stdlibtarfile.pyhandles it that way. Data may however already be written to the destination archive at that point.Afterwards, Acquire continues to add new files to the archive. When trying to untar the file using
tar -xvf <FILE>this would show as atar: Skipping to next headererror and finally, the process exists with a nonzero exit code.Included a test case which simulates a file that actually returns less bytes than its reported size, to test this case.