Skip to content

undefined behaviour in foyer-storage buffer #1223

@jayvdb

Description

@jayvdb

I have a WIP to add miri testing. Prior to this, I had found one trivial UB which cant happen in optimised builds, and I am not sure if it can happen in debug builds - I suspect not.

But now I have found one which is more interesting (note this command wont work on main; it needs a patch I have been working on ..)

% cargo +nightly miri test -p foyer-storage --no-default-features --features test_utils --lib engine::block::buffer::tests -- --nocapture
..
test engine::block::buffer::tests::test_split_block_last_entry ... error: Undefined Behavior: constructing invalid value at [0]: encountered uninitialized memory, but expected an integer
   --> /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/twox-hash-2.1.2/src/xxhash64.rs:160:34
    |
160 |             let lanes = unsafe { chunk.as_ptr().cast::<Lanes>().read_unaligned() };
    |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE on thread `engine::block::`:
    = note: inside `twox_hash::xxhash64::Accumulators::write_many` at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/twox-hash-2.1.2/src/xxhash64.rs:160:34: 160:81
    = note: inside `twox_hash::XxHash64::oneshot` at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/twox-hash-2.1.2/src/xxhash64.rs:256:20: 256:49
note: inside `serde::Checksummer::checksum64`
   --> foyer-storage/src/serde.rs:30:9
    |
 30 |         XxHash64::oneshot(0, buf)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `engine::block::buffer::BlobIndex::seal`
   --> foyer-storage/src/engine/block/buffer.rs:123:24
    |
123 |         let checksum = Checksummer::checksum64(&self.bytes[BlobIndex::CHECKSUM_BYTES..]);
    |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `engine::block::buffer::Splitter::split_blob`
   --> foyer-storage/src/engine/block/buffer.rs:401:25
    |
401 |             let index = ctx.current_blob_index.seal();
    |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `engine::block::buffer::Splitter::split`
   --> foyer-storage/src/engine/block/buffer.rs:343:41
    |
343 |                     if let Some(part) = Self::split_blob(ctx, &mut indices, &mut part_size, &mut bytes) {
    |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `engine::block::buffer::tests::test_split_block_last_entry`
   --> foyer-storage/src/engine/block/buffer.rs:700:21
    |
700 |           let batch = Splitter::split(
    |  _____________________^
701 | |             &mut ctx,
702 | |             shared_io_slice.clone(),
703 | |             vec![
...   |
716 | |             ],
717 | |         );
    | |_________^
note: inside closure
   --> foyer-storage/src/engine/block/buffer.rs:686:37
    |
684 |     #[test_log::test]
    |     ----------------- in this attribute macro expansion
685 |     #[cfg_attr(all(miri, not(target_os = "linux")), ignore = "requires Linux for tokio+miri")]
686 |     fn test_split_block_last_entry() {

The problem here is that BlobIndex::seal() and BlobIndexReader::read() were checksumming the entire buffer, including uninitialized memory beyond the written entries. This caused undefined behavior when twox-hash read uninitialized bytes.

test_buffer fails in a similar way, but for a different UB reason. Here the derive of PartialEq reads the entire buffer, including any uninitialised part of it.

test_split_block_last_entry fails in a similar way, but in this case it is the test which is unnecessarily triggering UB.

One aspect of the problem is https://github.com/foyer-rs/foyer/blob/7985fd7/foyer-storage/src/io/bytes.rs#L70 where Raw::new() is allocating memory using Global.allocate() which returns uninitialized memory. It is called "Raw", and yhe variable name is nonnull so I guess it is intentional that it is uninitialised. The docs should be more clear.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions