Migrating to post-quantum cryptography impacts performance differently depending on hardware, firmware, software, and use case. In this post, we migrate a Rust system from the quantum-vulnerable EdDSA to the post-quantum ML-DSA and measure the performance on a 2020 M1 MacBook Air. While the results from this exact setup won’t translate to every machine, the story should help you spot edge cases in your own migrations.

Migrating a Rust system from Ed25519 to ML-DSA-44

The Rust system we were migrating verified Ed25519 signatures (a specific variant of the EdDSA signature scheme) on blockchain transactions. ML-DSA-44 is one of the most verification-efficient NIST-standardized post-quantum signature schemes, and reports show it can be faster than Ed25519 at verification. So, we chose ML-DSA-44 for the migration. We hoped the move would add post-quantum security and deliver a performance boost, with ML-DSA-44 taking about 23% less time than Ed25519 to verify.

The crate used for Ed25519 in this system was dalek cryptography’s ed25519-dalek crate. Unfortunately, their excellent crate will someday be exterminated by quantum computers, so given our previous findings on the state of post-quantum cryptography in Rust, we decided to replace it with RustCrypto’s ml-dsa.

We began by setting up two initial benchmarks to compare the performance of verifying_key.verify on signatures generated by each scheme.

pub fn bench_verify_ed25519(c: &mut Criterion) {
let signing_key = Ed25519SigningKey::from_bytes(&[1u8; 32]);
let verifying_key = signing_key.verifying_key();
let message = b"Hello, signature!";
let signature = signing_key.sign(message);
c.bench_function("bench_verify_ed25519", |b| {
b.iter(|| verifying_key.verify(message, &signature).unwrap())
});
}
pub fn bench_verify_mldsa44(c: &mut Criterion) {
let seed: [u8; 32] = rand::random();
let keypair = MlDsa44::key_gen_internal(&seed.into());
let message = b"Hello, signature!";
let signature = keypair.signing_key().sign(message);
let verifying_key = keypair.verifying_key();
c.bench_function("bench_verify_mldsa44", |b| {
b.iter(|| verifying_key.verify(message, &signature))
});
}

Results:

bench_verify_ed25519 mean time: 33.287 µs
bench_verify_mldsa44 mean time: 32.178 µs

We were a bit surprised that ML-DSA was only slightly faster, given our expectation that it would take about 23% less time than Ed25519, but we figured that was fine for now and planned to revisit optimizations later in development.

We moved forward with the migration. In most blockchain transaction verifiers, there’s a function that takes the signature data from a transaction, decodes it into the relevant cryptographic datatypes, and runs the verification step. Something like:

pub fn verify_transaction_signature_ed25519(signature_bytes: &[u8], verifying_key_bytes: &[u8], message: &[u8]) -> bool {
// decode signature bytes into a Signature type
// decode verifying_key_bytes into a VerifyingKey type
// decode message into a Message type
// run the verification function
}

So we added a verify_transaction_signature_ml_dsa_44 function to replace the existing verify_transaction_signature_ed25519.

use ed25519_dalek::{Signature as Ed25519Signature, VerifyingKey as Ed25519VerifyingKey, Verifier as Ed25519Verifier};
use ml_dsa::{
EncodedVerifyingKey as MlDsaEncodedVerifyingKey, MlDsa44, Signature as MlDsaSignature,
VerifyingKey as MlDsaVerifyingKey,
};
// Existing Ed25519 verifier
pub fn verify_transaction_signature_ed25519(signature_bytes: &[u8], verifying_key_bytes: &[u8], message: &[u8]) -> bool {
// Ed25519 signature is 64 bytes, public key is 32 bytes
if signature_bytes.len() != 64 || verifying_key_bytes.len() != 32 {
return false;
}
let signature = match Ed25519Signature::from_slice(signature_bytes) {
Ok(sig) => sig,
Err(_) => return false,
};
let verifying_key = match Ed25519VerifyingKey::from_bytes(verifying_key_bytes.try_into().unwrap()) {
Ok(key) => key,
Err(_) => return false,
};
verifying_key.verify(message, &signature).is_ok()
}
// New ML-DSA-44 verifier
pub fn verify_transaction_signature_ml_dsa_44(signature_bytes: &[u8], verifying_key_bytes: &[u8], message: &[u8]) -> bool {
let Ok(encoded_verifying_key) = MlDsaEncodedVerifyingKey::<MlDsa44>::try_from(verifying_key_bytes) else {
return false;
};
let verifying_key = MlDsaVerifyingKey::<MlDsa44>::decode(&encoded_verifying_key);
let Ok(signature) = MlDsaSignature::<MlDsa44>::try_from(signature_bytes) else {
return false;
};
verifying_key.verify(message, &signature).is_ok()
}

But when running the first batches of requests using this function, we noticed a slowdown. We then benchmarked the two high-level functions.

pub fn bench_verify_transaction_ed25519(c: &mut Criterion) {
let signing_key = Ed25519SigningKey::from_bytes(&[1u8; 32]);
let verifying_key = signing_key.verifying_key();
let message = b"Hello, signature!";
let signature = signing_key.sign(message);
let verifying_key_bytes = verifying_key.to_bytes();
let signature_bytes = signature.to_bytes();
c.bench_function("bench_verify_transaction_ed25519", |b| {
b.iter(|| verify_transaction_signature_ed25519(&signature_bytes, &verifying_key_bytes, message))
});
}
pub fn bench_verify_transaction_mldsa44(c: &mut Criterion) {
let seed: [u8; 32] = rand::random();
let keypair = MlDsa44::key_gen_internal(&seed.into());
let message = b"Hello, signature!";
let signature = keypair.signing_key().sign(message);
let verifying_key = keypair.verifying_key();
let verifying_key_bytes = verifying_key.encode();
let signature_bytes = signature.encode();
c.bench_function("bench_verify_transaction_mldsa44", |b| {
b.iter(|| verify_transaction_signature_ml_dsa_44(
signature_bytes.as_ref(),
verifying_key_bytes.as_ref(),
message
))
});
}

Results:

bench_verify_transaction_ed25519 mean time: 36.090 µs
bench_verify_transaction_mldsa44 mean time: 89.193 µs

Surprisingly, verify_transaction_signature_ml_dsa_44 was significantly slower than verify_transaction_signature_ed25519. Since we had already benchmarked the signature verification itself, this suggested that the slowdown came from decoding the datatypes. This was strange, as decoding speed isn’t usually the bottleneck for signature verification. In this instance, decoding introduced a negligible 3 µs overhead for Ed25519, but a substantial 60 µs overhead for ML-DSA. This made verify_transaction_signature_ml_dsa_44 about three times slower than verify_transaction_signature_ed25519.

We ran more targeted benchmarks, which showed that the main slowdown came from decoding the verifying key.

pub fn bench_decode_mldsa44_verifying_key(c: &mut Criterion) {
let seed: [u8; 32] = rand::random();
let keypair = MlDsa44::key_gen_internal(&seed.into());
let verifying_key_bytes = keypair.verifying_key().encode();
c.bench_function("bench_decode_mldsa44_verifying_key", |b| b.iter(|| {
let encoded_verifying_key = MlDsaEncodedVerifyingKey::<MlDsa44>::try_from(verifying_key_bytes.as_ref()).unwrap();
MlDsaVerifyingKey::<MlDsa44>::decode(&encoded_verifying_key)
}));
}

Results:

bench_decode_mldsa44_verifying_key mean time: 56.907 µs

There’s a lesson here. On paper, the initial benchmark comparing the two verify functions looked good, but it ultimately didn’t produce accurate results for the transaction verification use case, because it didn’t include the decoding steps. When benchmarking for transaction verification, we should include all decoding steps, since these are required in most blockchain systems. Interestingly, RustCrypto’s ml-dsa benchmarks already include these decoding steps.

However, decoding speed won’t matter in all post-quantum migrations. Some use cases may decode a verifying key once and then call verify millions of times. In those cases, the verify benchmark we ran at the beginning would be sufficient.

Anyways, we went ahead and updated our benchmarks to include the decoding steps.

pub fn bench_decode_and_verify_ed25519(c: &mut Criterion) {
// Generate test data first
let signing_key = Ed25519SigningKey::from_bytes(&[1u8; 32]);
let verifying_key = signing_key.verifying_key();
let message = b"Hello, signature!";
let signature = signing_key.sign(message);
let verifying_key_bytes = verifying_key.to_bytes();
let signature_bytes = signature.to_bytes();
c.bench_function("bench_decode_and_verify_ed25519", |b| b.iter(|| {
let signature = Ed25519Signature::from_slice(&signature_bytes).unwrap();
let verifying_key = Ed25519VerifyingKey::from_bytes(&verifying_key_bytes).unwrap();
verifying_key.verify(message, &signature).is_ok()
}));
}
pub fn bench_decode_and_verify_mldsa44(c: &mut Criterion) {
// Generate test data first
let seed: [u8; 32] = rand::random();
let keypair = MlDsa44::key_gen_internal(&seed.into());
let message = b"Hello, signature!";
let signature = keypair.signing_key().sign(message);
let verifying_key_bytes = keypair.verifying_key().encode();
let signature_bytes = signature.encode();
c.bench_function("bench_decode_and_verify_mldsa44", |b| b.iter(|| {
let encoded_verifying_key = MlDsaEncodedVerifyingKey::<MlDsa44>::try_from(verifying_key_bytes.as_ref()).unwrap();
let verifying_key = MlDsaVerifyingKey::<MlDsa44>::decode(&encoded_verifying_key);
let signature = MlDsaSignature::<MlDsa44>::try_from(signature_bytes.as_ref()).unwrap();
verifying_key.verify(message, &signature).is_ok()
}));
}

Results:

bench_decode_and_verify_ed25519 mean time: 36.428 µs
bench_decode_and_verify_mldsa44 mean time: 89.984 µs

If we had written the benchmarks like this from the start, they would have shown the slowdown earlier, and saved us some time.


Getting the speedup we hoped for

We took a quick look in the RustCrypto ml-dsa repo for optimization features but didn’t spot any. So we put that crate on the bench for a moment and subbed in a competitor, libcrux-ml-dsa, to see if it showed the same poor performance. Unfortunately, they weren’t ready to enter the game yet, as they hadn’t published their crate to crates.io yet. We pushed them to go ahead with it, and thankfully they obliged.

Smarter this time, we added another benchmark for ML-DSA verification using libcrux-ml-dsa.

pub fn bench_decode_and_verify_mldsa44_libcrux(c: &mut Criterion) {
let mut rng = OsRng;
// Generate random seed for key generation
let mut randomness = [0u8; 32];
rng.try_fill_bytes(&mut randomness).unwrap();
// Generate test data first
let keypair = crux_ml_dsa_44::generate_key_pair(randomness);
let message = b"Hello, signature!";
// Generate random seed for signing
rng.try_fill_bytes(&mut randomness).unwrap();
let signature = crux_ml_dsa_44::sign(&keypair.signing_key, message, b"", randomness).unwrap();
// Get the raw bytes
let verifying_key_bytes = keypair.verification_key.as_ref().to_vec();
let signature_bytes = signature.as_ref().to_vec();
c.bench_function("bench_decode_and_verify_mldsa44_libcrux", |b| b.iter(|| {
// Convert slices to fixed-size array references
let verifying_key_array: &[u8; 1312] = verifying_key_bytes.as_slice().try_into().unwrap();
let verifying_key = CruxMLDSA44VerificationKey::new(*verifying_key_array);
let signature_array: &[u8; 2420] = signature_bytes.as_slice().try_into().unwrap();
let signature = CruxMLDSA44Signature::new(*signature_array);
crux_ml_dsa_44::verify(&verifying_key, message, b"", &signature).is_ok()
}));
}

Results:

bench_decode_and_verify_ed25519 mean time: 36.428 µs
bench_decode_and_verify_mldsa44_libcrux mean time: 40.238 µs

Better than RustCrypto’s ml-dsa, but still slightly slower than Ed25519…

We took a quick look to see if the crate had any built-in optimization features. We tried enabling the simd128 feature, but it didn’t make any difference. We took a hail mary shot at using their neon::verify function manually, but assumed that it was probably already used under the hood anyways by default when simd128 was enabled.

pub fn bench_decode_and_verify_mldsa44_libcrux_with_neon(c: &mut Criterion) {
let mut rng = OsRng;
// Generate random seed for key generation
let mut randomness = [0u8; 32];
rng.try_fill_bytes(&mut randomness).unwrap();
// Generate test data first
let keypair = crux_ml_dsa_44::generate_key_pair(randomness);
let message = b"Hello, signature!";
// Generate random seed for signing
rng.try_fill_bytes(&mut randomness).unwrap();
let signature = crux_ml_dsa_44::sign(&keypair.signing_key, message, b"", randomness).unwrap();
// Get the raw bytes
let verifying_key_bytes = keypair.verification_key.as_ref().to_vec();
let signature_bytes = signature.as_ref().to_vec();
c.bench_function("bench_decode_and_verify_mldsa44_libcrux_with_neon", |b| b.iter(|| {
// Convert slices to fixed-size array references
let verifying_key_array: &[u8; 1312] = verifying_key_bytes.as_slice().try_into().unwrap();
let verification_key = CruxMLDSA44VerificationKey::new(*verifying_key_array);
let signature_array: &[u8; 2420] = signature_bytes.as_slice().try_into().unwrap();
let signature = CruxMLDSA44Signature::new(*signature_array);
crux_ml_dsa_44::neon::verify(&verification_key, message, b"", &signature).is_ok()
}));
}

Results:

bench_decode_and_verify_ed25519 mean time: 36.428 µs
bench_decode_and_verify_mldsa44_libcrux_with_neon mean time: 33.380 µs

But it worked 🎉! Using neon::verify was ~8% faster than Ed25519. It wasn’t quite the 23% speedup we’d hoped for at the start, but 8% was still a win, and certainly better than a slowdown.

So, why did we need to use neon::verify? It turns out that neon is intended to be selected under the hood by default on ARM Macs, but this was failing on my machine. The libcrux-ml-dsa build script enables the simd128 feature flag at compile time, which worked as expected. However, at runtime it performs an additional simd128_support() check, which was returning as false on my machine and preventing it from using the neon verify functions. Through some debugging, I found that although my machine does have simd128 support, it lacks the hw.optional.AdvSIMD flag that libcrux expects. Instead, it reports a hw.optional.neon flag. Apple did switch from hw.optional.neon to hw.optional.AdvSIMD at some point, but it’s unclear when. I opened an issue on libcrux about this.

To finish off, we still had to write and benchmark the higher-level function:

use libcrux_ml_dsa::ml_dsa_44::{self as crux_ml_dsa_44, MLDSA44VerificationKey as CruxMLDSA44VerificationKey, MLDSA44Signature as CruxMLDSA44Signature};
pub fn verify_transaction_signature_ml_dsa_44_libcrux(signature_bytes: &[u8], verifying_key_bytes: &[u8], message: &[u8]) -> bool {
// ML-DSA-44 verifying key is 1312 bytes, signature is 2420 bytes
if verifying_key_bytes.len() != 1312 || signature_bytes.len() != 2420 {
return false;
}
// Convert slices to fixed-size array references
let verifying_key_bytes: &[u8; 1312] = verifying_key_bytes.try_into().unwrap();
let verifying_key = CruxMLDSA44VerificationKey::new(*verifying_key_bytes);
let signature_bytes: &[u8; 2420] = signature_bytes.try_into().unwrap();
let signature = CruxMLDSA44Signature::new(*signature_bytes);
// Verify using NEON optimized implementation
crux_ml_dsa_44::neon::verify(&verifying_key, message, b"", &signature).is_ok()
}
pub fn bench_verify_transaction_mldsa44_libcrux_with_neon(c: &mut Criterion) {
let mut rng = OsRng;
// Generate random seed for key generation
let mut randomness = [0u8; 32];
rng.try_fill_bytes(&mut randomness).unwrap();
// Generate test data first
let keypair = crux_ml_dsa_44::generate_key_pair(randomness);
let message = b"Hello, signature!";
// Generate random seed for signing
rng.try_fill_bytes(&mut randomness).unwrap();
let signature = crux_ml_dsa_44::sign(&keypair.signing_key, message, b"", randomness).unwrap();
// Get the raw bytes
let verifying_key_bytes = keypair.verification_key.as_ref().to_vec();
let signature_bytes = signature.as_ref().to_vec();
c.bench_function("bench_verify_transaction_mldsa44_libcrux_with_neon", |b| b.iter(|| {
verify_transaction_signature_ml_dsa_44_libcrux(&signature_bytes, &verifying_key_bytes, message)
}));
}

Results:

bench_verify_transaction_ed25519 mean time: 36.090 µs
bench_verify_transaction_mldsa44_libcrux_with_neon mean time: 33.585 µs

Thankfully, there weren’t any surprises this time 😅.

Conclusion

The moral of the story is that on-paper predictions don’t show the full picture of post-quantum migrations. Some packages you plan to use will lack hardware acceleration, and others may have subtle bugs that suppress it at runtime. In this post, we haven’t even touched on hardware variations, or the impact of these big signatures on storage and network latency.

You don’t know until you try it, which is what we’re doing every day at Project Eleven. You can try the benchmarks on your own hardware by cloning the sample repo, installing Rust if you haven’t already 🦀, and running cargo bench in the repository.