Skip to content

enhancement request with patch: track zero blocks after startup even if --listBlocks wasn't specified #28

@GoogleCodeExporter

Description

@GoogleCodeExporter
Even if --listBlocks wasn't specified, it makes sense to keep track of when 
zero blocks are read or written so that they don't have to be read or written 
repeatedly. The attached patch accomplishes this as follows:

* Change the non-zero block map into a zero block map, i.e., a bit in the map 
is set if the corresponding block is zero, rather than being set if it's 
non-zero. This change is not, strictly speaking, entirely necessary, since I 
could have just left it as a non-zero map and then checked for the opposite bit 
value, but I think it logically makes more sense for it to be zero map, and 
hence the code is clearer this way, because what we're really interested in 
knowing is the fact that a block is zero so we don't need to read or write it.

* Create an empty zero map when initializing http_io if --listBlocks wasn't 
specified.

* Add a bit to the zero map if we try to read a block and get ENOENT.

* Add a bit to the zero map if we write a zero block that wasn't previously 
zero.

This is actually the first patch of five I intend to submit in this area, if 
it's OK with you. They are:

1. This patch (track zero instead of non-zero blocks, and track even when 
--listBlocks wasn't specified).

2. Make --listBlocks happen in the background in a separate thread after the 
filesystem is mounted (this should be relatively easy to do now that I've done 
patch 1).

3. When a block that we expect to exist in S3 isn't there when we try to read 
it, restore it from the cache if possible.

4. When a block that we expect to exist in S3 isn't there when we do 
--listBlocks, restore it from the cache if possible.

5. Add an option to rerun --listBlocks periodically in the background while 
s3backer is running.

Patches 3-5 deserve some explanation. My concern is that, to a very small 
extent with regular S3 storage and to a much larger and even likely over time 
extent with reduced redundancy storage (RRS), blocks could simply disappear 
from S3 without any intervention on our part. I'm using s3backer to store my 
backups with rsync, so I'm using RRS, since all the data I'm saving exists on 
my desktop as well. However, the doc for RRS says that it should only be used 
for data that can be restored easily, and indeed it can in this case, since for 
performance reasons, my s3backer cache is big enough to hold my entire backup 
filesystem. Ergo, it makes a great deal of sense to teach s3backer how to 
automatically restore dropped blocks.

Please let me know your thoughts about this patch and my plans for the rest of 
them. Especially since I think I may need some guidance from you when 
implementing patches 3-5 :-).

Thanks,

  jik

Original issue reported on code.google.com by jikam...@gmail.com on 24 Oct 2010 at 7:45

Attachments:

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions