Description
I'm trying to generate vanity wallet addresses, which are originally in ed25519, so I start by generating random bytes from a PRNG enough to cast into a secret key and then generate key pairs, but I ran into performance issue so I want to debug it:
for _ in 0..cli.threads {
let thr = {
let counter = counter.clone();
move || {
let mut csprng = fastrand::Rng::new();
let mut pair = SecretKey::default();
loop {
let key: SigningKey = SigningKey::from_bytes(&pair);
counter.fetch_add(1, Ordering::Relaxed);
}
}
};
thread::spawn(thr);
}
On my 3700X PC, and running in full 16 threads, it only runs 300K keypairs/second with a precise monotonic clock. However, the C vanity address generator on my same PC, with same parameters can generate 30M keypairs/second with random data, so I think there could be something wrong here
So I started digging, which means we start from here:
curve25519-dalek/ed25519-dalek/src/signing.rs
Lines 102 to 108 in d5ef57a
And then here:
curve25519-dalek/ed25519-dalek/src/signing.rs
Lines 788 to 794 in d5ef57a
And then here:
curve25519-dalek/ed25519-dalek/src/hazmat.rs
Lines 57 to 76 in d5ef57a
And then here:
curve25519-dalek/curve25519-dalek/src/scalar.rs
Lines 237 to 246 in d5ef57a
And eventually here:
curve25519-dalek/curve25519-dalek/src/scalar.rs
Lines 1127 to 1128 in d5ef57a
I noticed that despite I'm using AVX512, the flame graph shows that those UnpackedScalar::mul_internal
and UnpackedScalar::montgomery_reduce
are still using generic serial implementation and took a large group of execution time, I'm not sure if this can be sped-up and improved? Or maybe I should look for something else for address generation?
I noticed that the wallet vanity address generator in C uses SUPERCOP's version, and their ed25519 key generation is much faster as well.