Skip to content

[Bug]: Log file cannot currently be in directory being archived #335

@forsyth2

Description

@forsyth2

What happened?

Log files can get corrupted if they are in the directory being archived by zstash. See #332 (comment)

What machine were you running on?

Chrysalis

Environment

zstash 1.4.2

Minimal Complete Verifiable Example (MCVE)

# Create zstash archive
mkdir zstash_20240408
echo 'file0 stuff' > zstash_20240408/file0.txt
cd zstash_20240408/
zstash create --hpss=none . 2>&1 | tee 20240408.log

# Now, try to extract it
rm -f 20240408.log file0.txt
zstash extract --hpss=none "*"
For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
INFO: zstash/000000.tar exists. Checking expected size matches actual size.
INFO: Opening tar archive zstash/000000.tar
INFO: Extracting 20240408.log
ERROR: md5 mismatch for: 20240408.log
ERROR: md5 of extracted file: a8600c75b3d84cdaefd020cf13fb6556
ERROR: md5 of original file:  00a33f0fdfbe470ae5b32123cc3e372c
INFO: Extracting file0.txt
Traceback (most recent call last):
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.9.3_login/lib/python3.10/site-packages/zstash/extract.py", line 535, in extractFiles
    tarinfo: tarfile.TarInfo = tar.tarinfo.fromtarfile(tar)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.9.3_login/lib/python3.10/tarfile.py", line 1293, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.9.3_login/lib/python3.10/tarfile.py", line 1237, in frombuf
    raise InvalidHeaderError("bad checksum")
tarfile.InvalidHeaderError: bad checksum
ERROR: Retrieving file0.txt
ERROR: Encountered an error for files:
ERROR: 20240408.log in 000000.tar
ERROR: file0.txt in 000000.tar
ERROR: The following tar archives had errors:
ERROR: 000000.tar

# Furthermore, try extracting the tar file directly
cd zstash
tar xvf 000000.tar 
20240408.log
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

# The only way to salvage data from the tar file is to use cpio (ironically)
cpio -ivd -H ustar < 000000.tar
././@PaxHeader
20240408.log
cpio: invalid header: checksum error
cpio: warning: skipped 29 bytes of junk
cpio: ././@PaxHeader not created: newer or same age version exists
././@PaxHeader
file0.txt
10 blocks

# At least the data file is recoverable even if the log file is not
cat file0.txt
file0 stuff

Relevant log output

No response

Anything else we need to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    semver: bugBug fix (will increment patch version)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions