Faster installation on McAfee systems

Some AV implementations (E.g. McAfee) hook in post-write, opening and scanning the file after it is written to disk. No implication of TOCTOU race, since they could also have a JIT check in place. But an IO pattern of:

* creat
* write
* close
* rename (either the file or a containing directory)
will race with the AV scanners

* open
* read
* close
calls on Windows OS's.
Note that the AV scanner is multi threaded, making many of these calls overlapped: trying to thread the needle between them with high frequency attempts is impossible and just increases CPU utilisation.

To fix that, either: don't perform that IO pattern, or use a retry-with-backoff strategy on the rename. We currently use the retry approach.

However that results in up to 143 second delays per rename that we do; which is roughly one-per-component, and components like `docs` are particularly affected.

If rather than renaming after close, we renamed before closing, we could avoid this condition. e.g.:
* create
* write
* flush
* SetFileInformationByHandle on NT, renameat() on posix
* close
on a per file basis, into the final location, then the AV scan would take place on the final file path, rather than on a temporary location, and no rename retry would be required.

There are two broad strategies we could use to achieve this: 

One: "indirect"
we could write all files to one or more centralised stores (e.g. perhaps a content addressable store), and then build a link farm from there to a directory representing the component, and rename the component into place. Error handling could be done by asserting content - every unpack just replaces content in the store - if its by full name (tar + expanded path) - we replace it; if its by hash, we can validate and replace if wrong. Or use fsync and trust no corruption ever occurs. Or variations could be made up.

Two: "direct"
we could write the files directly to their final locations, with additional error handling needed for partially unpacked components. Writes might be lost in a power failure at any point, and similarly out of disk space; and due to concurrency behaviours and disk out-of-order capabilities we can't assume that having written X then started on Y, the presence of Y implies X, unless we introduce barriers via sync calls. We could use the same strategy as for indirect: just unpack everything for anything that is suspect; and treat it all as suspect. Making sure we write metadata last and syncing that is perhaps a good common path; alternatively, checking the content *exists*, trusting that in the common case the full write was successful, and only linking in the manifest at the end might be good - and using locking on the metadata file to solve #988 at the same time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster installation on McAfee systems #2776

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Faster installation on McAfee systems #2776

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions