Compact Encoding

📖 What is Compact Encoding?

Branch Nodes:

Branch nodes inherently have a different structure than leaf and extension nodes. They always contain 17 elements: 16 slots for each possible nibble (0-F) and a 17th slot for storing a value. Because of this distinct structure, they are easily distinguishable from leaf and extension nodes.

When we retrieve a branch node from storage, its size and structure immediately indicate its type. There's no need for an added prefix because no other node in the trie has this specific 17-element structure.

Hash Nodes:

Hash nodes are essentially placeholders pointing to actual data. They contain a hash that can be used to retrieve a node from storage. The actual data retrieved using this hash will then clearly be one of the four node types (leaf, extension, branch, or another hash). The hash node itself doesn't require differentiation since it's just a pointer, and when you follow that pointer, the data you retrieve inherently defines its type.

Leaf and Extension Nodes

Unlike branch nodes, both leaf and extension nodes contain paths. The problem arises because their structures can look remarkably similar, especially if the paths they contain are of equal length or if their stored data is of a similar shape.

This structural similarity between leaf and extension nodes is where the challenge lies. Without some form of differentiation, reading a node from storage would leave us guessing: "Is this a path to a leaf node, or is this an intermediary extension path?"

How we fix this?

This is where compact encoding becomes a lifesaver. By adding a minimal prefix to leaf and extension nodes, we can immediately identify their types when reading from storage. This mechanism ensures that:

  1. No Ambiguity: There's clear differentiation between leaf and extension nodes.

  2. Efficient Storage: We're using just a couple of bits to convey vital information, preventing the need for larger metadata storage.

  3. Consistent Hashing: This prefix ensures the integrity of our trie, making sure that every hash computation is consistent.

In summary, while branch and hash nodes are distinguishable by their inherent structure and function, leaf and extension nodes, due to their similarity, require the compact encoding mechanism for immediate and clear differentiation.

Here's a simple table to show how it works:

luaCopy codehex char    bits    |    node type        |     path length
----------------------------------------------------------
   0        0000    |       extension     |          even
   1        0001    |       extension     |          odd
   2        0010    |       leaf          |          even
   3        0011    |       leaf          |          odd

🛠️ How it's Done in Go:

Adding the Prefix:

The CompactEncoding function adds the right prefix to our data, depending on its type and path length.

func CompactEncoding(ns []Nibble, isLeafNode bool) []Nibble {
	var prefix []Nibble

	if isLeafNode {
		if len(ns)%2 == 0 {
			prefix = []Nibble{2, 0} // leaf node with even path length
		} else {
			prefix = []Nibble{3} // leaf node with odd path length
		}
	} else {
		if len(ns)%2 == 0 {
			prefix = []Nibble{0, 0} // extension node with even path length
		} else {
			prefix = []Nibble{1} // extension node with odd path length
		}
	}

	// Append prefix to the nibbles slice
	return append(prefix, ns...)
}

Removing the Prefix:

The RemoveCompactEncoding function takes off the prefix when we don't need it anymore.

func RemoveCompactEncoding(encodedPath []Nibble) []Nibble {
	if len(encodedPath) == 0 {
		return encodedPath
	}

	switch encodedPath[0] {
	case 0, 2:
		return encodedPath[2:]
	case 1, 3:
		return encodedPath[1:]
	default:
		return encodedPath // Invalid case, but just return as-is for this example
	}
}

Last updated