We will here analyze the time complexity of the famous Union-Find algorithm which uses **Path Compression** And **Union by Rank**. We are going to assume that there are n vertices and m find operations . (Union operation is in fact two find operations and O(1) merging). We will give here one after one lemmas and its intuitive proofs and slowly build up the final time complexity proof.

Lemma (1): If x has a parent that is x is a non-root node, then rank(x) < rank(parent(x))

**Proof** : We can prove this by induction. Initially all nodes are root nodes. So there are no non-root nodes. Hence it is true initially. Now a new link is created either by the union algorithm or using path compression during find algorithm. We will prove that any new link created by either algorithm will maintain the invariance that rank(x) < rank(parent(x)).

**Union algorithm** : Whenever a new link is created by union algorithm, there are two cases.

(1) Union of two nodes having different ranks : The lower rank node is connected to higher rank node. So in this case the new link maintains the invariance.

(2) Union of two nodes having same ranks: In this case we connect the any one node to other node. Suppose we connect node1 to node2. But then we increase the rank of node 2 by 1. So for this new link rank(parent(x)) = rank(x) + 1. So clearly rank(x) < rank(parent(x)). So this case also maintains the invariance.

**Find algorithm : **The path compression technique might introduce some new links. Note that path compression algorithm joins the node with its root . As rank(x) < rank(parent(x)) < rank(parent(parent(x))) ….. So eventually rank(x) < rank(root(x)). So the path compression algorithm also maintains the invariance.

Hence our lemma is proved.

Lemma (2): Rank Of A Non Root node never changes

Rank change can only be done by union algorithm. But by algorithm of union,it only changes the rank of the root nodes. Remember that in union algorithm, two vertices are replaced by their respective roots before merging.

Hence our lemma is proved.

Lemma (3): There are at least 2^{r}nodes in a tree with root having rank r.

We will prove this using induction. Initially all nodes have rank 0 and also the have exactly 1 element (that is itself) which is greater than or equals to 2^{0 }= 1. Now we will prove that our union-find algorithm will maintain this invariance. We are never going to remove any node from the tree containing the root. However it might seem like this condition may be violated when root changes its rank. So lets analyze this situation.

**Change of rank of root by Union Algorithm** : Rank of a node changes only when 2 nodes having same rank are combined. So before this operation both nodes have at least 2^{r} elements. So after union, the root’s rank will be (r+1) and it will have at least 2^{r} + 2^{r} = 2.2^{r} = 2^{r+1 }elements. Thus this operation maintains the invariance.

Hence our lemma is proved.

Lemma (4): Rank(x) <= log_{2}n for any x

This lemma is just consequence of **lemma (3) **. A node with rank log_{2}n have at least 2^{log2n} = n nodes. And our tree cannot contain more than n nodes.

Hence our lemma is proved.

Lemma (5): The maximum number of nodes of rankris at mostn/2^{r}.

This lemma is just consequence of **lemma (3)**.

Lemma (6): Rank(parent(x)) is strictly increases after each path compression operated on them for each node x such that x is neither a root nor an immediate child of the root.

As stated in the proof of **lemma (1)** that ,rank(x) < rank(parent(x)) < rank(root(x)) given x satisfies the constraints of our lemma. Now after path compression operated on x , x is joined to root(x). So now parent(x) = root(x) after path compression. As before the operation rank(parent(x)) < rank(root(x)) the rank(parent(x)) is strictly increasing for any node x which satisfies the lemma constraints.

Hence our lemma proved.

Now to prove our time complexity,

**Divide the ranks into buckets** such that,

bucket 1 contains nodes with rank 1

bucket 2 contains nodes with rank 2,3

bucket 3 contains nodes with rank 4,5,6,7,…… 2^4 – 1 (15)

bucket 4 contains 16,17,18,……2^16 – 1(65535)

bucket 5 contains 65536,……2^65536 – 1

and so on

Rank 0 is also in bucket 0 as an exception.

Lemma (7): There are at most log*n buckets.

It is intuitive from the way we have divided buckets. To know what is log*n visit this Wikipedia link.

Hence our lemma is proved.

Lemma (8): The maximum number of elements in bucket [B, 2^{B}– 1] is at most 2n/2^{B}

Maximum number of elements

= n/2^B + n/2^(B+1) + …. n/2^(2^B – 1) (**From Lemma (5)**)

= n/2^B(1 + 1/2 + …..)

<= n/2^B (2) (GP formula for infinite GP)

<= 2n/2^B

Hence our lemma is proved.

Lemma (9): Time complexity T can be written as summation of number of 3 types of links traversed.

- Link which points to the root.
- Link in which the ranks of node and its parent falls in two different buckets.
- Link in which the ranks of node and is parent falls in the same buckets.

The proof is intuitive to understand. Time complexity is nothing but the number of links traversed by the find algorithm over m finds. **The second and third types of links does not cover links which point to the root.**

Lemma (10): There at O(m) link traversals of type 1

Each time find algorithm is called, there is only one link which points to the root. And as there are m finds, there will be O(m) link traversals of type 1.

Lemma (11): There are O(m log*n ) link traversals of type 2.

From **Lemma (1)** and **Lemma (7)** we can say that there at most log*n link traversals per operation. So in total there are O(m log*n ) operations.

Lemma (12): There are O(n log*n) link traversals of type 3

Number of links of type 3 = Σ (number of links of type 3 starting from u.) (where u goes from 1…n)

Now for fixed u, suppose u belongs in the bucket [*B*, 2^{B} – 1] then there will at most (2^B -1 – B) links which is at most 2^B links from **Lemma (6)** **. **

Number of links of type 3 = Σ [2n/(2^B)]*(2^B) (summation is over all B) (From **Lemma (6)** and **Lemma (8)** ) What we have basically done is converted summation over u to summation over B and multiplied inner term with max number of nodes present in bucket.

Number of links of type 3 = 2n log*n = O(n log*n)

Hence our lemma is proved.

Theorem : The Time complexity T = O((n+m) log*n )

Intuitive from Lemma (10), Lemma (11) and Lemma (12).

T = O(m) + O(m log*n) + O(n log*n)

= O((n+m) log*n)

Hence our lemma is proved

Reference Materials :

(1) Wikipedia link

(2) Princeton University Slides