-
Notifications
You must be signed in to change notification settings - Fork 1
Filesystem hash map
hpfs maintains a block hash map of all the virtual files in the filesystem so it can calcuate a root hash of the entire filesystem. Blake2 hashing is used to compute data hashes. XOR is used to calcualte composite hashes in a performance-efficient manner.
- Block hashes are calculated for each 4MB block of each file.
block_hash = blake2(block_offset + block_data)
- File hash is calculated using file name and all the block hashes of the file.
file_hash = XOR(blake2(file_name), block_hashes)
- Directory hash is calculated using directory name and all the file and directory hashes in that directory.
dir_hash = XOR(blake2(directory_name), children_hashes)
- Filesystem root hash is the
dir_hashof the filesystem root directory.
- During RO/RW session initialization, hpfs attempts to load the filesystem hash map from the saved cache.
- If saved hash map cache is not available, it rebuilds the hash map and saves to the cache by traversing the entire virtual filesystem.
- Only RW session updates an existing hash map cache. It does so upon exit.
- Reading from the hash map cache is performed atomically so that cache updates from RW session do not interfere with cache reads by RO session.
The following filesystem operations cause the file hash to be changed: create, write, truncate, rename.
-
create:
file_hash = blake2(file_name) -
write:
file_hash = XOR(old_file_hash, old_block_hash, new_block_hash) -
rename:
file_hash = XOR(blake2(new_file_name), block_hashes) -
truncate:
- If file size increased:
file_hash = XOR(old_file_hash, old_last_block_hash, effected_new_block_hashes) - If file size reduced:
file_hash = XOR(old_file_hash, effected_old_block_hashes, new_last_block_hash)
- If file size increased:
The following filesystem operations cause the parent directory hash to be changed: mkdir, rmdir, unlink and any file hash update caused by create, write, truncate, rename.
-
mkdir:
dir_hash = XOR(dir_hash, new_child_dir_hash)[new_child_dir_hash = blake2(new_child_dir_name)] -
rmdir:
dir_hash = XOR(dir_hash, removed_child_dir_hash) -
unlink:
dir_hash = XOR(dir_hash, removed_file_hash) -
create:
dir_hash = XOR(dir_hash, new_file_hash) -
write:
dir_hash = XOR(dir_hash, old_file_hash, new_file_hash) -
rename:
- Source parent dir:
dir_hash = XOR(dir_hash, old_file_hash) - Target parent dir:
dir_hash = XOR(dir_hash, new_file_hash)
- Source parent dir:
-
truncate:
dir_hash = XOR(dir_hash, old_file_hash, new_file_hash)
During RO/RW session, hpfs supports inquiring about hash map information of the virtual filesystem. The query interface is provided via a special virtual file path in the RO/RW session virtual filesystem. The following information can be queried by reading the virtual path as a regular file:
The hash of a specified directory or file. This can be used to get the filesystem root hash by querying the root dir ('/').
Read path: <path>::hpfs.hmap.hash
Response: [ hash (32 bytes) ]
List of child file/dir hashes under a specified directory.
Read path: <dir_path>::hpfs.hmap.children
Response: List of [ is_file (8 bytes) | name (256 bytes) | hash (32 bytes) ]
List of block hashes of a specified file.
Read path: <file_path>::hpfs.hmap.children
Response: List of [ block_hash (32 bytes) ]