The Security Challenges in Building zkEVM

Security

Aug 17, 2023

Critical Issues Found
Missing constraints in PIL state machines
Missing constraints in PIL storage state machine
Exploitation of these issues could lead to potential loss of funds
Other Important Findings
The Importance of continuous Security Review for Novel Technologies
Takeaways for Securing Crypto's Next Generation

Polygon's zkEVM has the potential to revolutionize Ethereum scaling and bring increased adoption of zero knowledge technology. However, as with any new complex system, thorough security validation is required before full production launch. This was the goal of a comprehensive audit conducted by Hexens on the core components of zkEVM, including the smart contracts, prover and ROM.

For the uninitiated, zkEVM is Polygon's novel solution for scaling Ethereum using zero knowledge proofs while still maintaining compatibility with the EVM execution layer. This allows existing Ethereum smart contracts, developer tools, and wallets to function seamlessly while benefiting from the speed, cost and privacy advantages of zk-SNARKs.

The audited codebase mostly included the zkProver which consists of following components:

PIL (Polynomial Identity Language) – The "Hardware" level used to define state machines of the VM using constraints.
zkASM – the assembly language that works on top of the specific state machines that were created for zkEVM architecture
The zkEVM ROM which implements the EVM ruleset in zero knowledge friendly zkASM

zkEVM Architecture Diagram

zkEVM Components Overview

With such a complex system, the potential attack surface is significant, requiring expertise across smart contract security, zero knowledge proofs, and EVM specifics. Hexens brought these skills to the table in the audit, the results of which serve as an informative case study for the due diligence required when building novel blockchain systems.

Critical Issues Found

ERC777 reentrancy attack vector in bridge contract

One of the most serious issues uncovered was an ERC777 token reentrancy vulnerability in the zkEVM bridge contract responsible for asset deposits and withdrawals. For background, ERC777 tokens enhance the standard ERC20 implementation by introducing new callbacks that can be triggered on token transfers.

Hexens discovered that the bridge contract did not properly account for these callbacks when handling ERC777 deposits, opening the door to potential reentrancy attacks. Specifically, the ERC777 "tokensToSend" callback can be triggered before the bridge contract updates its own balance.

By recursively calling back into the deposit function, an attacker could artificially inflate the deposit amounts. For example, with 3 levels of reentrancy depositing 1 token could result in a deposited amount of 3 tokens.

This recursion can continue infinitely with no limit (well, up until the gas limit) on the fake deposits being created. When withdrawals are later enabled, this could allow the attacker to instantly drain much more from the bridge than they rightfully deposited.

The report notes that this attack requires only a single transaction, meaning funds could be jeopardized immediately.

Incorrect CTX handling leading to enormous ether rewards for sequencer

Another critical finding involved incorrect handling of CTX, the virtual address spaces used by zkEVM to manage contract call contexts. Specifically, the security engineers found issues in the identity precompile contract's context switching logic.

The identity contract is basically the echo precompile contract. If called directly, it should effectively only charge intrinsic gas costs. However, a bug was identified where the context was incorrectly set to the global state instead of the specific call context.

CTX Handling Issue

Context Switching Bug

Due to the overlapping variable offsets between the global and call context namespaces, this resulted in mixups when loading state. Most alarmingly, the transaction gas limit variable which should be quite small was instead loaded with the global state root hash which is almost guaranteed to have a very big absolute value.

With the huge state root value and some additional manipulation, this could allow the sequencer to massively inflate any caller's (or their own) ether balance in the verified state. Given the sequencer's special status, this is a serious vulnerability that could lead to loss of bridge funds.

Missing constraints in PIL state machines

One of the most interesting attack vectors involved missing constraints in the PIL state machines used to generate zkSNARK circuits for verifying zkEVM proofs. Specifically, the audit revealed insufficient constraints for jump instructions.

This opened the door to attackers being able to craft malicious proofs that could hijack execution flow and redirect to arbitrary code segments. One example implementation could allow attackers to artificially credit themselves with huge ether balances by jumping to a specific gadget-chain ending with balance-updating logic.

The root cause lies in the lack of constraints to validate the special selector of jump opcodes used throughout the zkEVM PIL codebase. Without proper validation, attackers can supply crafted inputs that lead execution to unintended code areas.

Introducing additional constraints on this selector removes the attack vector.

In addition to the jump instruction issues, the audit also revealed missing constraints around free input handling in the PIL state machines.

Free inputs are values that are supplied to the prover either as user supplied data or to avoid complex calculations.

Introducing tightened constraints around the allowable range for free inputs is crucial. However, the finding demonstrates the difficulty in identifying and covering all edge cases.

Missing constraints in PIL storage state machine

Another critical issue stems from a missing constraint in the PIL code that handles key inclusion checks when retrieving data from the SMT. The SMT uses a tree structure where key paths are constructed using the least significant bits of the key to traverse down to the relevant leaf node.

To verify correct key-value binding, the PIL code reconstructs the key path by prepending successive key bits to a remaining key value (rkey). However, the polynomial representing the next key bit lacks a binary constraint to restrict it to 0 or 1 values.

By manipulating this next key bit value, an attacker could traverse an incorrect path down the tree and fake the binding between a key of their choosing and a leaf value. If the prerequisites are met to modify all rkey registers, the root check for proof of value inclusion can also be bypassed.

To exploit this, the attacker needs to insert a leaf at a suitable depth with 1111 as the least significant bits of the key. This allows modifying all rkey registers when traversing back up the tree later. The main impact would be faking the balance value bound to an attacker's account address.

This vulnerability exposed the integrity of the SMT data structure to potential corruption. Adding the missing constraint and modifying the ROM to validate next bit values resolves the issue.

Exploitation of these issues could lead to potential loss of funds

The described vulnerabilities are all critical severity. This means they could lead to loss of funds, broken consensus, or denial of service, etc.

The implications are stark. Reentrancy could instantly drain bridge assets. Incorrect CTX handling enables sequencer to credit huge amount of ether to any address. Insufficient PIL constraints allows to bypass the soundness of the system and prove incorrect statements.

While Polygon addressed the findings, this highlights the risks inherent in pioneering blockchain innovations. Security is essential with novel systems.

Other Important Findings

In addition to the critical vulnerabilities, the audit revealed other concerns that could disrupt system operations:

A maxmem handling bug allows attackers to halt batch verification by pushing this memory register out of its allowed range. This essentially freezes the rollups from progressing.
Subtle differences in how zkEVM and EVM handle RLP decoding of transactions creates room for "poison transactions" that would fail EVM validity checks but pass in zkEVM. At scale, this could prevent sequencers from finalizing batches.
Mismatched gas limit and chain ID sizes compared to EVM enables crafted transactions that would only work on zkEVM. Repeated exploitation could degrade network stability.

While rated lower severity individually, these findings still warrant priority. They erode consistency with EVM, which is critical for smooth cross-compatibility. Even if not directly exploitable, these issues may undermine core functions and require diligent attention.

The Importance of continuous Security Review for Novel Technologies

The range of issues uncovered by this audit highlights the tremendous ongoing effort required to build and maintain complex blockchain solutions like zkEVM. Being the first to implement a technology like zero-knowledge rollups with EVM compatibility brings an array of challenges.

The findings run the gamut from subtle compiler bugs to convoluted exploit chains spanning contract logic. Identifying and resolving these low level intricacies is non-trivial. It is a testament to the skill and dedication of the Polygon zkEVM team that all discovered issues have been addressed responsibly.

However, continued internal reviews, external audits, and bug bounties are recommended to validate zkEVM as it moves to production. Ongoing maintenance and performance tuning will be required as usage increases and new edge cases emerge.

The process of rigorously battle testing and incrementally strengthening new blockchain innovations is challenging but rewarding. Polygon is advancing the state of zero-knowledge technology and providing valuable research for the community. But as this audit clearly demonstrates, realizing the full vision of zkEVM will demand extensive continued effort.

Takeaways for Securing Crypto's Next Generation

Hexens' audit of zkEVM was invaluable, revealing vulnerabilities spanning from consensus risks to subtle compatibility gaps. Polygon's swift resolution demonstrates their commitment to security.

However, this is just one step towards hardening zkEVM for production. Expanded testing and incremental deployments should continue as usage grows. Securing novel systems is challenging but essential work.

The path forward demands extensive ongoing diligence against emergent threats. With coordinated effort, zkEVM can provide scalability and enable wider zero knowledge adoption. But pioneering new cryptographic frontiers requires substantial maintenance.

The full report detailing all the findings from Hexens' audit of the zkEVM codebase can be found on GitHub.

The Security Challenges in Building zkEVM

Table of contents

Critical Issues Found

Missing constraints in PIL state machines

Missing constraints in PIL storage state machine

Exploitation of these issues could lead to potential loss of funds

Other Important Findings

The Importance of continuous Security Review for Novel Technologies

Takeaways for Securing Crypto's Next Generation

Related Articles

Attacks on Threshold Schemes: Part 2

Attacks on Threshold Schemes: Part 1

Token Risk Scanning for Traders: Glider Flags 20+ on-chain risks others miss.