Skip to content

ostree archive's mtime modification results in slower python execution #1469

Open
coreos/rpm-ostree
#5019
@euank

Description

@euank

System information

$ cat /etc/os-release 
NAME=Fedora
VERSION="27 (Atomic Host)"
ID=fedora
VERSION_ID=27
PRETTY_NAME="Fedora 27 (Atomic Host)"
VARIANT="Atomic Host"
VARIANT_ID=atomic.host

$ curl 169.254.169.254/latest/meta-data/instance-type; echo; curl 169.254.169.254/latest/meta-data/ami-id; echo
t2.micro
ami-4ef37336

High level problem

At a high level, the problem is that atomic and other python programs are slow by default.

$ time atomic --help >/dev/null

real	0m1.268s
user	0m1.184s
sys	0m0.059s

$ sudo ostree admin unlock

$ time sudo atomic --help > /dev/null

real	0m1.402s
user	0m1.202s
sys	0m0.136s
$ time sudo atomic --help > /dev/null

real	0m0.453s
user	0m0.375s
sys	0m0.055s

As we can clearly see, the atomic cli has a performance gain of 3x to 4x simply by letting it rewrite the pyc files. This performance issue will likely be shared by most other python software.

Why does this happen?

ostree sets a file's timestamps to 0:

archive_entry_update_pathname_utf8 (entry, pathstr);
archive_entry_set_ctime (entry, ts, OSTREE_TIMESTAMP);
archive_entry_set_mtime (entry, ts, OSTREE_TIMESTAMP);
archive_entry_set_atime (entry, ts, OSTREE_TIMESTAMP);

Unfortunately, this results in some files being considered incorrect. For example, a .pyc file, per this blog, includes information about the 'mtime' of the .py file it was sourced from.

Since ostree modifies that timestamp, the .pyc file created during the rpmbuild becomes invalid and is ignored.

We can verify this is the problem by using perf to notice where time is spent, or using strace to notice which files are changed that result in the performance speedup, e.g.:

$ sudo strace -e trace=open,openat atomic --help 2>&1 | cat | grep -Eo '/usr/lib[^"]+' | grep -E "\.py$" | wc -l
455

$ # ostree unlock and sudo atomic, as above

$ sudo strace -e trace=open,openat atomic --help 2>&1 | cat | grep -Eo '/usr/lib[^"]+' | grep -E "\.py$" | wc -l
0

$ find /var/tmp/ostree-unlock-ovl.XCJGFZ/upper/ -name "*.pyc" | wc -l
455

Additional considerations

Note that python is not unique in using mtime to store information about whether a given cache file is up to date. I don't know off-hand of other software that's definitely impacted by this choice. , but I wouldn't be surprised if the go toolchain had a similar problem (edit, nope, go's pkg directory on usr seems to be handled fine)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions