Rooted Tree Isomorphism with Prime Numbers: A Practical Python Solution

Introduction

Graph isomorphism asks whether two graphs have the same structure. This is the same as asking if it’s possible to relabel vertices of a graph to make it equal another graph. Determining whether two graphs are isomorphic is a computationally challenging problem. It’s one of two problems known to be in NP but it is not yet known whether it lies in P or is NP-Complete.

The most naive (and computationally expensive) algorithm for determining if two graphs are isomorphic is a brute-force algorithm. This works by exhaustively testing all possible mappings of nodes from one graph to the other until a match is found. This would run in factorial time. Not ideal!

The state-of-the-art algorithms run in 2^O(n) time for all graphs. These algorithms are hard-to-understand and don’t have simple software implementations.

Restricting graphs to be only rooted trees (which I’ll refer to simply as “trees” from here on) gives us an algorithm which is both easy to understand and has an acceptable runtime.

Tree optimisations

Firstly, one requirement for two graphs to be isomorphic is that they must have the same number of nodes. That means we can reject any graphs which don’t have the same number of nodes. For trees, we can further strengthen the check: we can instantly reject trees of different heights. This will reduce the time taken to compare trees of different heights, but for trees of the same height we will have to inspect the graphs further. Therefore this only improves some cases, not the worst case.

We can use the fact that both trees are rooted to our advantage. Rather than comparing the comparing the entirety of one tree with another at the same time, we use a much simpler iterative algorithm.

We start by comparing the leaves of the tree and then work our way up to the root of the tree. This takes advantage of the following property: if for subtree from a given node in G_1, G_2 has a node a subtree which is isomorphic to the subtree from the node in G_1, then the trees are the same.

That means that once we have determined that a node’s subtree is equal to an equivalent node in the other graph, we can forget about the structure of the subtree and we no longer need to consider that in our analysis. This helps to speed up our algorithm compared to the brute-force approach.

In code

Aho, Hopcroft and Ullman first described this iterative tree isomorphism algorithm in The Design and Analysis of Computer Algorithms. It uses lexicographically-sorted tuples to store information about the structure of a subtree. From the way the pseudocode is written in their paper, the algorithm appears cryptic.

Florian Ingels gives some alternative ways to store the information about a subtree in Revisiting Tree Isomorphism, the most interesting variant uses prime numbers in place of the original algorithm’s tuples. I’ve written a Python implementation of Ingels’ prime-multiplication variant of Ullman’s algorithm. This snippet determines if two nx.Graph instances are isomorphic. The variable names match his pseudocode in “§2.2 AHU with primes” to make it easy to compare my implementation with his code listing.

from math import prod
import networkx as nx

def rooted_tree_isomorphism(tree1: nx.Graph, tree2: nx.Graph) -> bool:
    trees = [tree1, tree2]
    layers = [list(reversed(list(nx.bfs_layers(t, t.graph["root"])))) for t in trees]

    if len(layers[0]) != len(layers[1]):
        return False

    c_values = {0: {}, 1: {}}
    f = lambda n: get_n_prime_number(n + 1)

    for layer in range(len(layers[0])):
        current_layer_c_values = {0: set(), 1: set()}

        for tree_index in range(2):
            current_layer = layers[tree_index][layer]
            for u in current_layer:
                neighbours = trees[tree_index].neighbors(u)
                neighbour_c_values = [
                    c_values[tree_index].get(n, None) for n in neighbours
                ]

                n_u = prod(c for c in neighbour_c_values if c != None)
                c_u = f(n_u)

                c_values[tree_index][u] = f(c_u)
                current_layer_c_values[tree_index].add(c_u)

        if current_layer_c_values[0] != current_layer_c_values[1]:
            return False

    return True

Using this helper function to get the nth prime number:

from sympy import sieve


def get_n_prime_number(n: int) -> int:
    sieve.extend_to_no(n)

    return sieve[n - 1]

Conclusion

This short blog post provides a modern implementation of a simple algorithm to decide tree isomorphism. I couldn’t find many examples of this online, so I hope this post helps someone looking for a clear and practical implementation!

Introduction#

Tree optimisations#

In code#

Conclusion#

Introduction

Tree optimisations

In code

Conclusion