Skip to content

Align with GNU Tar when a file name is too long #130819

Open
@gdh1995

Description

@gdh1995

Bug report

Bug description:

Recently I found tarfile may generate a file slightly different with the one made by GNU Tar (https://www.gnu.org/software/tar/), especially when a path name is longer than 100 bytes.

Here's the test code:

# py-tar.py
import tarfile, io
memory_file = io.BytesIO()
tar_obj = tarfile.open(name=None, mode="w", fileobj=memory_file, format=tarfile.GNU_FORMAT)
tar_info = tarfile.TarInfo("abcdef" * 20)
tar_info.type = tarfile.DIRTYPE
tar_info.mode = 0o755
tar_info.mtime = 1609459200  # UTC 2021-01-01
tar_info.uid = 1000
tar_info.gid = 1000
tar_info.uname = "ubuntu"
tar_info.gname = "ubuntu"
tar_obj.addfile(tar_info, None)
tar_obj.close()
memory_file.seek(0)
binary_data = memory_file.read()
# import binascii
# hex_data = binascii.hexlify(binary_data)
# sep = 16
# for i in range(0, len(hex_data), sep * 2):
#     part = hex_data[i:i + sep * 2]
#     print(*(part[i:i+2].decode() for i in range(0, len(part), 2)), binary_data[i//2:][:sep], sep=" ")
with open("py.tar", "wb") as fp:
    fp.write(binary_data)
mkdir -m 755 abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef
tar cf gnu.tar --sort=name --owner=ubuntu:1000 --group=ubuntu:1000 --mtime='UTC 2021-01-01' abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef/
python ./py-tar.py

As a result, when comparing the generated py.tar and gnu.tar, we may get such a difference:

Image

So I wonder will python might align such a detail on tarfile.GNU_FORMAT with the one of GNU tar?

BTW, here's my environment (I'm on Ubuntu 24.04), and I find the main branch of CPython has a similar Lib/tarfile.py and should have a same behavior difference:

$ LANG=C tar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
$ LANG=C python --version
Python 3.12.3

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-featureA feature request or enhancement

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions