Merkle Trees

Cryptography

Merkle Trees

A Merkle tree is a hash-based data structure where every leaf node contains the hash of a data block, and every internal node contains the hash of its children. The single root hash at the top acts as a compact fingerprint of the entire dataset. Merkle trees enable efficient, trustless verification — you can prove any element belongs to the dataset by providing only O(log n) hashes, without downloading the full dataset. They are the backbone of blockchain state management, airdrop eligibility proofs, certificate transparency, and git’s content-addressed storage.


Structure and Visual Diagram

A Merkle tree for four data elements D₀, D₁, D₂, D₃:

                    ┌──────────┐
                    │   Root   │
                    │ H(H₀₁ ‖ │
                    │   H₂₃)  │
                    └────┬─────┘
                   ╱            ╲
          ┌───────┴──┐      ┌───┴───────┐
          │   H₀₁    │      │   H₂₃    │
          │H(H₀ ‖ H₁)│      │H(H₂ ‖ H₃)│
          └─────┬─────┘      └─────┬─────┘
          ╱         ╲          ╱         ╲
     ┌───┴──┐  ┌───┴──┐  ┌───┴──┐  ┌───┴──┐
     │  H₀  │  │  H₁  │  │  H₂  │  │  H₃  │
     │H(D₀) │  │H(D₁) │  │H(D₂) │  │H(D₃) │
     └──────┘  └──────┘  └──────┘  └──────┘
        ↑          ↑          ↑          ↑
       D₀         D₁         D₂         D₃

Properties:

  • Leaves: H₀ = H(D₀), H₁ = H(D₁), etc.
  • Internal nodes: H₀₁ = H(H₀ ‖ H₁)
  • Root: Root = H(H₀₁ ‖ H₂₃) — changes if any data element changes
  • Depth: log₂(n) for n leaves

Merkle Proof Walkthrough

Suppose we want to prove that D₂ is in the tree, given only the root hash. The prover provides a Merkle proof (also called an authentication path): the sibling hashes along the path from the leaf to the root.

Step-by-Step Example

Goal: Prove D₂ is at index 2 in a tree with root R.

Proof: [H₃, H₀₁] — the sibling at each level.

VERIFICATION:

Step 1: Compute leaf hash
    H₂ = H(D₂)

Step 2: Combine with sibling H₃ (D₂ is a LEFT child)
    H₂₃ = H(H₂ ‖ H₃)

Step 3: Combine with sibling H₀₁ (H₂₃ is a RIGHT child)
    Root' = H(H₀₁ ‖ H₂₃)

Step 4: Check
    Root' == R?  → ✓ Accept  /  ✗ Reject
                    ┌──────────┐
                    │   Root   │ ← Step 3: compute & compare
                    └────┬─────┘
                   ╱            ╲
          ┌───────┴──┐      ┌───┴───────┐
          │  *H₀₁*   │      │   H₂₃    │ ← Step 2: compute
          │(provided) │      │(computed) │
          └──────────┘      └─────┬─────┘
                            ╱         ╲
                       ┌───┴──┐  ┌───┴──┐
                       │  H₂  │  │ *H₃* │
                       │(comp)│  │(prov)│
                       └──────┘  └──────┘

                         D₂ (the element we're proving)

Elements marked with * are provided in the proof. Everything else is computed by the verifier.

Pseudocode

def verify_merkle_proof(leaf_data, index, proof, root, hash_fn):
    """
    leaf_data: the data element to verify
    index:     position in the tree (determines left/right at each level)
    proof:     list of sibling hashes [sibling_0, sibling_1, ...]
    root:      the known Merkle root
    """
    current = hash_fn(leaf_data)

    for i, sibling in enumerate(proof):
        if index & 1 == 0:                          # current is left child
            current = hash_fn(current + sibling)
        else:                                        # current is right child
            current = hash_fn(sibling + current)
        index >>= 1                                  # move to parent level

    return current == root

Complexity Analysis

OperationTimeSpace
Build treeO(n)O(n)
Generate proofO(log n)O(log n)
Verify proofO(log n)O(log n)
Update one leafO(log n)O(1)
Proof sizeO(log n) hashes

For a tree with 1 million leaves (depth 20), a proof is just 20 hashes — 640 bytes with SHA-256, or 640 bytes with Poseidon (32-byte field elements). This logarithmic scaling is what makes Merkle trees practical for massive datasets.


Sparse Merkle Trees

A standard Merkle tree is dense — leaves are at consecutive positions. A sparse Merkle tree (SMT) has a fixed depth (typically 256) where most leaves are empty (a default “zero” value). Each possible key maps to a specific leaf position.

Sparse Merkle Tree (depth 256):
─────────────────────────────────
                 Root
                /    \
              H₀      H₁
             / \      / \
           ...  ... ...  ...

    Leaf at position 0x00...00: empty (default hash)
    Leaf at position 0x3a...f7: value V₁
    Leaf at position 0xa1...2c: value V₂
    ... (2²⁵⁶ - 2 leaves are empty)

Why Sparse?

The tree has 2²⁵⁶ possible positions, but only a tiny fraction contain actual data. The trick: precompute the hash of every “all-empty subtree” at each depth. An empty subtree of depth d has a fixed, known hash Z_d:

Z₀ = H(0)                    // empty leaf
Z₁ = H(Z₀ ‖ Z₀)            // subtree of two empty leaves
Z₂ = H(Z₁ ‖ Z₁)            // subtree of four empty leaves
...
Z₂₅₆ = H(Z₂₅₅ ‖ Z₂₅₅)    // root of entirely empty tree

Storage is proportional to the number of non-empty leaves, not the tree size.

Non-Inclusion Proofs

A critical feature of sparse Merkle trees: you can prove a key is NOT in the tree by providing a Merkle proof to its position showing the leaf is empty (the default value). This is impossible with standard dense Merkle trees.

Prove key K is NOT in the tree:
    1. Look up leaf at position K
    2. Leaf value = default (empty)
    3. Provide Merkle proof for this empty leaf
    4. Verifier confirms: the leaf at position K is empty → K not in tree

Use in State Trees

Sparse Merkle trees are used in rollups and privacy protocols to represent:

  • Account balances: Key = address, Value = balance
  • Nullifier sets: Key = nullifier hash, Value = 0 (unused) or 1 (spent)
  • Commitment membership: Key = commitment, Value = 1 if present

Merkle Patricia Tries

Ethereum uses a Merkle Patricia Trie (MPT) — a combination of a Merkle tree and a Patricia trie (radix tree) — to store its world state.

Why Not a Simple Merkle Tree?

Ethereum’s state is a key-value store (address → account data). A simple Merkle tree would require:

  • A fixed ordering of keys (addresses)
  • Rebuilding when keys are added
  • No efficient way to handle sparse key spaces

Structure

The MPT uses the hex nibbles of the key as a path through the trie:

Key: 0xABCD...

Path through trie:
    Root → [A] → [B] → [C] → [D] → ... → Value

Node types:
    Branch:    16 children (one per hex digit) + value
    Extension: shared prefix + child
    Leaf:      remaining path + value

Three Merkle Patricia Tries are in every Ethereum block header:

  1. State trie: Maps address → (nonce, balance, storageRoot, codeHash)
  2. Transaction trie: Maps index → transaction
  3. Receipt trie: Maps index → receipt (logs, gas used, status)

Trade-offs

PropertySparse Merkle TreeMerkle Patricia Trie
Proof sizeO(depth) = O(256)O(key length) ≈ O(64 nibbles)
ImplementationSimpleComplex (3 node types)
Proof of non-inclusionNativeRequires proof that path terminates
Update efficiencyO(log n)O(key length)
ZK-friendlinessGoodPoor (complex branching logic)

Verkle Trees

Verkle trees are the planned successor to Merkle Patricia Tries in Ethereum (EIP-6800). They replace hash-based internal nodes with vector commitments.

The Key Insight

In a Merkle tree, proving one leaf requires O(log n) sibling hashes. If the branching factor is k, each proof includes k - 1 siblings per level. For a binary tree (k = 2), that’s 1 sibling per level — manageable. But for a wide tree (k = 256), it’s 255 siblings per level — massive proofs.

Verkle trees solve this by using vector commitments (specifically, IPA or KZG) at each node. A vector commitment allows you to prove the value at a specific position without revealing any other values:

MERKLE TREE (branching factor 256):
    Proof per level: 255 sibling hashes = 255 × 32 bytes = 8,160 bytes
    10 levels: ~80 KB per proof

VERKLE TREE (branching factor 256):
    Proof per level: 1 vector commitment opening = ~32 bytes
    10 levels: ~320 bytes per proof

Structure

Verkle Tree (branching factor 256):
─────────────────────────────────────

              ┌──────────┐
              │ Root VC   │  ← Vector commitment to 256 children
              └─────┬─────┘
        ┌───────────┼───────────┐
        ↓           ↓           ↓
    ┌───────┐  ┌───────┐  ┌───────┐
    │ VC₀   │  │ VC₁   │  │VC₂₅₅ │  ← Each is a vector commitment
    └───┬───┘  └───┬───┘  └───┬───┘
      ╱ | ╲     ╱ | ╲      ╱ | ╲
    v₀ v₁...  v₀ v₁...   v₀ v₁...   ← Leaf values

    VC = VectorCommit([child₀, child₁, ..., child₂₅₅])
    Proof for child_i: VectorOpen(VC, i) → π (constant size)

Ethereum’s Migration Plan

Ethereum currently stores ~1 billion state entries in Merkle Patricia Tries. The migration to Verkle trees would:

  • Reduce witness sizes from ~1-3 MB to ~150 KB per block
  • Enable stateless clients: validators don’t need the full state tree
  • Simplify proofs: no complex branching logic
  • Use IPA commitments (not KZG) for quantum-resistance considerations

Hash Function Choices for ZK

When Merkle trees are used inside zero-knowledge circuits (e.g., proving you know a leaf in a commitment tree), the hash function choice dramatically affects performance.

Performance Comparison in Circuits

Hash function constraint counts (for one compression):
──────────────────────────────────────────────────────

SHA-256:     ████████████████████████████ 25,000 R1CS constraints
Keccak-256:  ████████████████████████████████████████████████████ 150,000
Blake2s:     ██████████████████████ 10,000
MiMC:        █ 730
Poseidon:    ▌ 250
Rescue:      █ 600

For a Merkle proof of depth 20:
    SHA-256:  20 × 25,000  = 500,000 constraints
    Poseidon: 20 × 250     =   5,000 constraints  (100x fewer!)

Why Poseidon is ZK-Friendly

Traditional hash functions (SHA-256, Keccak) are designed for bitwise operations — XOR, rotations, shifts. These are cheap on CPUs but extremely expensive to express as arithmetic circuit constraints (each bit operation becomes many multiplications in a finite field).

Poseidon uses only field-native operations:

  • Addition in F_p
  • Multiplication in F_p
  • Exponentiation: x^5 (or x^α)

Every step is a natural arithmetic constraint. See Commitment Schemes for more on Poseidon.

Practical Approach

Most protocols use a hybrid strategy:

INSIDE ZK CIRCUIT:
    Merkle tree hash: Poseidon  (minimize constraints)
    Nullifier hash:   Poseidon
    Commitment hash:  Poseidon

OUTSIDE CIRCUIT (on-chain):
    State root verification: Keccak-256  (Ethereum-native opcode)
    Address derivation:      Keccak-256
    Block hashing:           Keccak-256

Real-World Uses

Blockchain State Roots

Every Bitcoin block header contains a Merkle root of all transactions in the block. A light client (SPV) can verify a specific transaction is included by requesting a Merkle proof (~320 bytes for a block with 1000 transactions) instead of downloading the entire block.

Ethereum extends this with three state roots per block, enabling light clients to verify any account balance, storage slot, or transaction receipt.

Airdrop Eligibility (Merkle Drops)

A common DeFi pattern: distribute tokens to thousands of eligible addresses without storing the full list on-chain.

# Off-chain: build Merkle tree of eligible addresses
leaves = [keccak256(address, amount) for address, amount in eligible]
tree = MerkleTree(leaves)
publish(tree.root)   # store one hash on-chain

# On-chain: each user claims with a Merkle proof
def claim(address, amount, proof):
    leaf = keccak256(address, amount)
    assert verify_proof(leaf, proof, merkle_root)
    assert not claimed[address]
    claimed[address] = True
    token.transfer(address, amount)

Storing 10,000 eligible addresses on-chain would cost millions of gas. A Merkle root costs ~20,000 gas, and each claim provides its own proof.

Certificate Transparency

Google’s Certificate Transparency (CT) logs use Merkle trees to maintain an append-only log of all TLS certificates. Auditors can:

  • Verify a certificate exists in the log (inclusion proof)
  • Verify the log is append-only and hasn’t been tampered with (consistency proof)

Git Internals

Git is fundamentally a content-addressed Merkle DAG (directed acyclic graph). Every commit, tree, and blob is identified by its SHA-1 hash. A commit references a tree hash, which references blob hashes. Changing any file changes every hash up to the root — exactly a Merkle tree property.

Privacy Protocol Commitment Trees

In Tornado Cash, Zcash, and similar protocols, deposits create commitments that are inserted as leaves in a Merkle tree. To withdraw, a user proves in zero knowledge that their commitment is a leaf in the tree (via a Merkle proof inside a ZK circuit), without revealing which leaf. See Nullifiers for the double-spend prevention mechanism.


Accumulator Alternatives

Merkle trees are the most widely deployed accumulator, but other constructions exist with different trade-offs.

RSA Accumulators

Based on the RSA assumption (difficulty of factoring large numbers):

Accumulator value: A = g^(p₁ · p₂ · ... · pₙ) mod N

To prove element pᵢ is in the set:
    Witness: wᵢ = g^(∏ⱼ≠ᵢ pⱼ) mod N
    Verify:  wᵢ^pᵢ mod N == A

Advantages: Constant-size proofs (one group element), constant-size accumulator, supports efficient batch proofs and non-membership proofs.

Disadvantages: Requires a trusted setup (RSA modulus N must have unknown factorization), operations are slow (large modular exponentiations), not quantum-safe.

Bilinear Map Accumulators

Based on bilinear pairings over elliptic curves:

Setup: trusted setup produces (g, g^τ, g^τ², ..., g^τⁿ)

Accumulator for set S = {s₁, ..., sₙ}:
    A = g^∏(τ - sᵢ)

Membership proof for sⱼ:
    W = g^∏ᵢ≠ⱼ(τ - sᵢ)
    Verify: e(W, g^τ · g^(-sⱼ)) == e(A, g)

Advantages: Constant-size proofs, supports non-membership proofs, efficient batch operations.

Disadvantages: Trusted setup, pairing computations are expensive, not quantum-safe.

Comparison

PropertyMerkle TreeRSA AccumulatorBilinear Accumulator
Proof sizeO(log n)O(1)O(1)
Accumulator sizeO(1) — just rootO(1)O(1)
Trusted setupNoYes (RSA modulus)Yes (SRS)
Update (add element)O(log n)O(n) naivelyO(n) naively
Non-membership proofOnly in sparse variantYesYes
Quantum-safeYes (hash-based)NoNo
ZK-friendlinessGood (with Poseidon)PoorModerate

In practice, Merkle trees dominate due to their simplicity, transparency (no trusted setup), and ZK-friendliness when paired with algebraic hash functions.


References

  • Merkle, R. “A Digital Signature Based on a Conventional Encryption Function.” CRYPTO ‘87.
  • Laurie, B., Langley, A., Kasper, E. “Certificate Transparency.” RFC 6962, 2013.
  • Dahlberg, R., et al. “Efficient Sparse Merkle Trees.” NordSec 2016.
  • Kuszmaul, J. “Verkle Trees.” 2019. (Based on work by John Kuszmaul.)
  • Ethereum Foundation. “EIP-6800: Ethereum state using a unified verkle tree.” 2023.
  • Grassi, L., et al. “Poseidon: A New Hash Function for Zero-Knowledge Proof Systems.” USENIX Security 2021.
  • Commitment Schemes — leaves in privacy-protocol Merkle trees are typically commitments
  • Nullifiers — used alongside Merkle trees to prevent double-spending in private systems
  • Privacy Patterns — Merkle proofs inside ZK circuits enable private set membership
  • STARKs vs SNARKs — proof systems that use Merkle trees for FRI commitments and polynomial IOPs