Skip to content

Skip over unallocated spaces during send #228

Open
@tasket

Description

Wyng send will currently examine all unallocated portions of a volume under certain conditions, such as during the volume's initial send. It will also examine/compare all portions that have been de-allocated since the previous send, so there is some impact on incremental backups as well. This results in slower access than what is possible.

Cases where this has an impact:

  • Adding large volumes to an archive
  • Deleting large amounts of data from a volume
  • Increasing a volume's size

Optimization could be achieved by creating a twin of the delta map, a zero map, during one of the early stages of the send process including get_delta_digest(). The zero mapping code would have to conform to each storage type, and the reflink version may be able to consume a 'tee' of the fiemap data. (An alternative would be to use SEEK_HOLE and SEEK_DATA, although they're unlikely to work with tlvm.)

The tlvm version might collect any "left-only" references in the case of an incremental send, or else do an extra metadata extraction step using a tlvm command other than thin_delta.

Assuming the result of zero mapping is a per-chunk bitmap like the delta map, the send_volume() function could attempt to skip through 8-bit or larger segments similar to how it handles the delta bmap_list.

One desired result would be the ability to add a mostly empty, terabyte-sized volume to an archive in a matter of seconds or a few minutes. Another result would be incremental send for a volume that had a vast amount of data deleted taking only a fraction of the time it would in the current worst-case scenario.


To illustrate the large difference that delta mapping vs (lack of) unallocated mapping makes:

Adding a new 1TB mostly-empty (1.5MB) volume to an archive took over 14 minutes.

Adding 48MB to that volume and doing an incremental (mapped) send took 9 seconds. So a backup of 32X the data finished in 1/93 the time. (The incremental send didn't have to compare large amounts of zeros because data had not been deleted from the volume, only added.)

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions