Skip to content

Heed3 : reading encrypted data in parallel threads leads to corruption #336

@emuellen

Description

@emuellen

Hello, I recently upgraded to 0.22 and now I wanted to use the encrypted environment feature. Unfortunately, after just replacing all necessary signatures and changing the transactions to &mut, I get a lot of segmentation faults when running my tests. I already tried cloning the values I get the from lmdb to a vector as I was afraid that the internal buffer would be too small, but nothing helped. Do you know if running multiple threads with read transactions in an encrypted environment is supposed to work? Thanks a lot in advance for your response!

Here is an example that I use for my tests:

use std::{error::Error, sync::Arc, time::Instant};

use argon2::Argon2;

use chacha20poly1305::{ChaCha20Poly1305, Key};
use heed3::{
    types::{Bytes, Str},
    EnvOpenOptions,
};

use std::fs;

fn main() -> Result<(), Box<dyn Error + Send + Sync>> {
    let path_non_enc = "/tmp/lmdb_test_db_non_enc";
    let path = "/tmp/lmdb_test_db_enc";
    fs::create_dir_all(path_non_enc).expect("Failed to create path");
    fs::create_dir_all(path).expect("Failed to create path");

    let password = "thisisthepasswordasdfdasfasdf";
    let salt = "thisisthesalt123456789asdfasdffds";
    let mut key = Key::default();
    Argon2::default().hash_password_into(password.as_bytes(), salt.as_bytes(), &mut key)?;

    let start = Instant::now();

    //let env = unsafe { EnvOpenOptions::new().max_readers(20).map_size(1 << 32).max_dbs(5).open(path_non_enc)? };
    let env = unsafe { EnvOpenOptions::new().max_readers(20).map_size(1 << 32).max_dbs(5).open_encrypted::<ChaCha20Poly1305, _>(key, path)? };
    let env = Arc::new(env);

    let mut txn = env.write_txn()?;
    let db = env.create_database::<Str, Bytes>(&mut txn, Some("test_db"))?;
    for key in 0..1000 {
        db.put(&mut txn, &format!("{key}"), "toto".as_bytes()).unwrap();
    }
    txn.commit()?;

    let mut threads = Vec::new();
    for _ in 0..std::thread::available_parallelism().unwrap().get() {
        let env = env.clone();
        let thread = std::thread::spawn(move || {
            let mut txn = env.read_txn().unwrap();
            for (_, value) in db.iter(&mut txn).unwrap().flatten() {
                assert_eq!(value, "toto".as_bytes());
            }
        });
        threads.push(thread);
    }
    for thread in threads.into_iter() {
        thread.join().unwrap();
    }

    eprintln!("Elapsed time: {}ms", start.elapsed().as_millis());

    Ok(())
}

With the non-encypted environment this works very well.
Could you please have a look?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions