Skip to content

Help out for batched random key generation? #705

Open
@stevefan1999-personal

Description

@stevefan1999-personal

I'm trying to generate vanity wallet addresses, which are originally in ed25519, so I start by generating random bytes from a PRNG enough to cast into a secret key and then generate key pairs, but I ran into performance issue so I want to debug it:

for _ in 0..cli.threads {
    let thr = {
        let counter = counter.clone();
        move || {
            let mut csprng = fastrand::Rng::new();
            let mut pair = SecretKey::default();

            loop {
                let key: SigningKey = SigningKey::from_bytes(&pair);
                counter.fetch_add(1, Ordering::Relaxed);
            }
        }
    };

    thread::spawn(thr);
}

On my 3700X PC, and running in full 16 threads, it only runs 300K keypairs/second with a precise monotonic clock. However, the C vanity address generator on my same PC, with same parameters can generate 30M keypairs/second with random data, so I think there could be something wrong here

So I started digging, which means we start from here:

pub fn from_bytes(secret_key: &SecretKey) -> Self {
let verifying_key = VerifyingKey::from(&ExpandedSecretKey::from(secret_key));
Self {
secret_key: *secret_key,
verifying_key,
}
}

And then here:

impl From<&SecretKey> for ExpandedSecretKey {
#[allow(clippy::unwrap_used)]
fn from(secret_key: &SecretKey) -> ExpandedSecretKey {
let hash = Sha512::default().chain_update(secret_key).finalize();
ExpandedSecretKey::from_bytes(hash.as_ref())
}
}

And then here:

impl ExpandedSecretKey {
/// Construct an `ExpandedSecretKey` from an array of 64 bytes. In the spec, the bytes are the
/// output of a SHA-512 hash. This clamps the first 32 bytes and uses it as a scalar, and uses
/// the second 32 bytes as a domain separator for hashing.
pub fn from_bytes(bytes: &[u8; 64]) -> Self {
// TODO: Use bytes.split_array_ref once it’s in MSRV.
let mut scalar_bytes: [u8; 32] = [0u8; 32];
let mut hash_prefix: [u8; 32] = [0u8; 32];
scalar_bytes.copy_from_slice(&bytes[00..32]);
hash_prefix.copy_from_slice(&bytes[32..64]);
// For signing, we'll need the integer, clamped, and converted to a Scalar. See
// PureEdDSA.keygen in RFC 8032 Appendix A.
let scalar = Scalar::from_bytes_mod_order(clamp_integer(scalar_bytes));
ExpandedSecretKey {
scalar,
hash_prefix,
}
}

And then here:

pub fn from_bytes_mod_order(bytes: [u8; 32]) -> Scalar {
// Temporarily allow s_unreduced.bytes > 2^255 ...
let s_unreduced = Scalar { bytes };
// Then reduce mod the group order and return the reduced representative.
let s = s_unreduced.reduce();
debug_assert_eq!(0u8, s[31] >> 7);
s
}

And eventually here:

let xR = UnpackedScalar::mul_internal(&x, &constants::R);
let x_mod_l = UnpackedScalar::montgomery_reduce(&xR);

I noticed that despite I'm using AVX512, the flame graph shows that those UnpackedScalar::mul_internal and UnpackedScalar::montgomery_reduce are still using generic serial implementation and took a large group of execution time, I'm not sure if this can be sped-up and improved? Or maybe I should look for something else for address generation?

I noticed that the wallet vanity address generator in C uses SUPERCOP's version, and their ed25519 key generation is much faster as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions