-
Notifications
You must be signed in to change notification settings - Fork 38
Prevent .tar file corruption by patching short reads #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import copy | ||
| import io | ||
| import shutil | ||
| import tarfile | ||
| from typing import TYPE_CHECKING, BinaryIO | ||
|
|
||
|
|
@@ -100,7 +102,52 @@ def write( | |
| if stat: | ||
| info.mtime = stat.st_mtime | ||
|
|
||
| self.tar.addfile(info, fh) | ||
| # Inline version of Python stdlib's tarfile.addfile & tarfile.copyfileobj, | ||
| # to allow for padding and more control over the tar file writing. | ||
| self.tar._check("awx") | ||
|
|
||
| if fh is None and info.isreg() and info.size != 0: | ||
| raise ValueError("fileobj not provided for non zero-size regular file") | ||
|
|
||
| info = copy.copy(info) | ||
|
|
||
| buf = info.tobuf(self.tar.format, self.tar.encoding, self.tar.errors) | ||
| self.tar.fileobj.write(buf) | ||
| self.tar.offset += len(buf) | ||
| bufsize = self.tar.copybufsize | ||
| if fh is not None: | ||
| bufsize = bufsize or 16 * 1024 | ||
|
|
||
| if info.size == 0: | ||
| return | ||
| if info.size is None: | ||
| shutil.copyfileobj(fh, self.tar.fileobj, bufsize) | ||
| return | ||
|
Comment on lines
+123
to
+125
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can remove this block since it would be an illegal action in this context. |
||
|
|
||
| blocks, remainder = divmod(info.size, bufsize) | ||
| for _ in range(blocks): | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Long or short? |
||
| buf = fh.read(bufsize) | ||
| if len(buf) < bufsize: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can generalize this case instead of doing it twice. Keep track of how many bytes you actually wrote (i.e. using |
||
| # PATCH; instead of raising an exception, pad the data to the desired length | ||
| buf += tarfile.NUL * (bufsize - len(buf)) | ||
| self.tar.fileobj.write(buf) | ||
|
|
||
| if remainder != 0: | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Long or short? |
||
| buf = fh.read(remainder) | ||
| if len(buf) < remainder: | ||
| # PATCH; instead of raising an exception, pad the data to the desired length | ||
| buf += tarfile.NUL * (remainder - len(buf)) | ||
| self.tar.fileobj.write(buf) | ||
|
|
||
| blocks, remainder = divmod(info.size, tarfile.BLOCKSIZE) | ||
| if remainder > 0: | ||
| self.tar.fileobj.write(tarfile.NUL * (tarfile.BLOCKSIZE - remainder)) | ||
| blocks += 1 | ||
| self.tar.offset += blocks * tarfile.BLOCKSIZE | ||
|
|
||
| self.tar.members.append(info) | ||
|
|
||
| def close(self) -> None: | ||
| """Closes the tar file.""" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make this even safer by truncating to the previous offset/tar member end if any exception occurs while writing.