cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs


  • Tao Lu Zhejiang University, Hangzhou, China
  • Chengkun Wei Zhejiang University, Hangzhou, China
  • Ruijing Yu Zhejiang University, Hangzhou, China
  • Chaochao Chen Zhejiang University, Hangzhou, China
  • Wenjing Fang Ant Group, Hangzhou, China
  • Lei Wang Ant Group, Hangzhou, China
  • Zeke Wang Zhejiang University, Hangzhou, China
  • Wenzhi Chen Zhejiang University, Hangzhou, China



Zero-knowledge Proof, Multi-scalar Multiplication, Parallel Algorithm, Graphics Processing Unit


Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.




How to Cite

Lu, T., Wei, C., Yu, R., Chen, C., Fang, W., Wang, L., … Chen, W. (2023). cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2023(3), 194–220.