Cryptography

QuantumLock

Post-quantum cryptographic infrastructure benchmarked under CPU-only constraints.

RustringpqcryptoserdeDocker

<50

Keygen Latency

27

Keygen Reduction

1M+

Operations Benchmarked

Kyber-768

Security Level

01

Problem

Problem statement, constraint shape, and the gap this project explores.

Problem Statement

Can post-quantum algorithms meet latency requirements on commodity CPU-only hardware?

Challenge

Built cryptographic infrastructure balancing security guarantees with operational constraints. Designed for environments where performance, memory usage, and algorithm choice directly impact system viability.

Why Existing Approaches Failed

Most post-quantum cryptography research benchmarks on high-end hardware with AVX-512 extensions. The real challenge is deploying PQ algorithms on constrained infrastructure — cloud VMs without AVX-512, edge devices, and environments where GPU acceleration is unavailable. QuantumLock explored whether Kyber-based key encapsulation and lattice-based operations could meet sub-50ms latency targets under these constraints.

02

Constraints

Non-negotiable boundaries that shaped the implementation.

Hardware

CPU-only, no AVX-512

Latency Target

<50ms per operation

Scale

1M+ operations benchmarked

Threads

8-core parallel execution

Memory

Bounded working set

03

Architecture

The primary design surface: flow, subsystem roles, and state boundaries.

Architecture Brief

QuantumLock implements a Kyber-768 key encapsulation pipeline in Rust using the pqcrypto crate family. The benchmark harness runs keygen, encapsulation, and decapsulation across thread pools of varying sizes to expose contention points.

Execution Flow
01

Keygen

02

Public/Private keypair

03

Encapsulate (shared secret + ciphertext)

04

Decapsulate (verify shared secret)

05

Benchmark record

01

KEM Pipeline

Kyber-768 keygen → encapsulation → decapsulation chain.

02

Benchmark Harness

Criterion-based microbenchmarks with statistical analysis.

03

Thread Pool Manager

Rayon-based parallelism with configurable worker counts.

04

SIMD Dispatcher

Runtime detection of SIMD capabilities with scalar fallback.

05

Serialization Layer

serde-based key serialization for persistence benchmarks.

04

Engineering Tradeoffs

Design review notes: what was optimized and what was deliberately left behind.

EDR-01
Decision

SIMD vectorization vs naive parallelism

Why Chosen

Naive thread parallelism caused lock contention on shared RNG state. SIMD operates within a single core and scales independently.

Alternative Rejected

Portability across all CPU targets

Impact

SIMD vectorization for polynomial arithmetic

EDR-02
Decision

Kyber vs NTRU vs McEliece

Why Chosen

Kyber is the NIST PQC standard, has mature Rust implementations, and offers the best latency/security tradeoff at the 768-bit security level.

Alternative Rejected

Smaller key sizes (McEliece is smaller)

Impact

Kyber-768

EDR-03
Decision

pqcrypto vs custom implementation

Why Chosen

Cryptographic correctness is non-negotiable. Audited reference implementations reduce the risk of subtle timing side channels in polynomial arithmetic.

Alternative Rejected

Full control over inner arithmetic

Impact

pqcrypto crate

05

Failure Modes

Incident-style notes for the ways the design can break.

Thread contention on RNG

FM-01
Impact

Multiple threads sharing a single OsRng caused throughput degradation.

Mitigation

by per-thread RNG instances.

Serialization overhead masking crypto latency

FM-02
Impact

serde serialization dominated benchmark numbers.

Mitigation

by separating crypto benchmarks from serialization benchmarks.

Compiler optimization masking work

FM-03
Impact

Criterion benchmarks required black_box() hints to prevent dead-code elimination from skewing results.

Mitigation

Criterion benchmarks required black_box() hints to prevent dead-code elimination from skewing results.

SIMD detection false positives

FM-04
Impact

cpuid queries on some VMs returned incorrect capability flags. Added runtime verification with fallback.

Mitigation

cpuid queries on some VMs returned incorrect capability flags. Added runtime verification with fallback.

06

Benchmarks

Environment first, numbers second. Metrics should be inspectable, not ornamental.

Test Environment
Runtime

Rust / ring

Workload

<50ms CPU-only latency

Stack

Rust, ring, pqcrypto, serde

Scope

Cryptography

Evidence

Project-level benchmark notes

Performance Results
Keygen Latency

<50ms

Keygen Reduction

27%

Operations Benchmarked

1M+ops

Security Level

Kyber-768NIST L3

Threads

8cores

07

Lessons Learned

Engineering takeaways from the implementation, including remaining work.

01

RNG contention is the silent killer of parallel cryptographic benchmarks

always use per-thread RNG.

02

SIMD gains in polynomial arithmetic are more significant than thread-level parallelism for this workload.

03

Benchmarking crypto correctly requires careful use of black_box() and warm-up iterations to avoid compiler artifacts.

04

Future: explore hardware acceleration via dedicated crypto coprocessors and measure the gap vs pure-Rust implementations.

Akshat