0% completed
Union Find, also known as Disjoint Set Union (DSU), is a data structure that keeps track of a partition of a set into disjoint subsets (meaning no set overlaps with another). It provides two primary operations: find, which determines which subset a particular element is in, and union, which merges two subsets into a single subset. This pattern is particularly useful for problems where we need to find whether 2 elements belong to the same group or need to solve connectivity-related problems in a graph or tree.
Core Operations of Union-Find (Disjoint Set Union - DSU):
- Find: Determine which set a particular element belongs to. This can be used for determining if two elements are in the same set.
- Union: Merge two sets together. This operation is used when there is a relationship between two elements, indicating they should be in the same set.
Union-Find: A Story of Connections
Imagine a social network where everyone is initially a single user with no friends. The moment two users become friends, they form a group. If a user becomes a friend of someone from another group, the two groups merge into a larger friend circle. Union-Find helps track these friendships and circles.
How DSU Work?
- Initial State: Everyone is in their own separate group. Think of each person as their own little island.
- Making Connections (Union): When two people shake hands, they now belong to the same group. If they were already in groups, their entire groups merge.
- Checking Connections (Find): Want to know if two people are connected? Union-Find can check if they're in the same group.
How DSU is Implemented?
- Parents Array: Each element has a parent. In the beginning, each element is their own parent.
- Find Operation: Follows the chain of parents until it reaches the root parent, which represents the group.
- Union Operation: Connects two groups by setting the parent of one group’s root to the other group’s root.
Naive Implementation of Disjoint Set
1. Creating Disjoint Sets
- Each element points to a parent.
- Initially, each element is its own parent, forming
n
disjoint sets. - A "root" element is one that points to itself, identifying the set's representative.
# Initialize 'n' elements where each element is its own parent. parent = [i for i in range(n)] # 'n' is the number of elements
2. Find Operation
The Find
operation determines which set a particular element belongs to. This can be used to determine if two elements are in the same set.
In the below code:
parent
is an array whereparent[i]
is the parent ofi
.- If
parent[i] == i
, theni
is the root of the set and hence the representative. - The
find
function follows the chain of parents fori
until it reaches the root.
Above function recursively traverses the parent array until it finds a node that is the parent of itself.
Complexity Analysis
- Time Complexity: In the naive implementation, the time complexity of the
find
operation is O(N), where (N) is the number of elements. This is because, in the worst case, when all elements are in the same set, we may have to traverse up to (N) elements to find the representative. - Space Complexity: The space complexity of the
find
operation is O(1) since it uses a constant amount of extra space.
3. Union Operation
The Union
operation merges two disjoint sets. It takes two elements as input, finds the representatives of their sets using the Find
operation, and finally merges the two sets.
In the code below:
-
Find the Representatives:
int irep = find(i);
– Finds the representative (root) of the set containing elementi
.int jrep = find(j);
– Finds the representative (root) of the set containing elementj
.
-
Check if They Are in the Same Set:
if (irep == jrep) { return; }
– Ifi
andj
have the same representative, they are already in the same set, so no union is needed.
-
Merge the Sets:
parent[irep] = jrep;
– Makes the representative ofi
's set point to the representative ofj
's set, effectively merging the two sets.
The above function merges two sets by making the representative of one set the parent of the other.
Complexity Analysis
- Time Complexity: The time complexity of the
union
operation is dominated by the time complexity of thefind
operation, making it O(N) in the worst case. - Space Complexity: The space complexity of the
union
operation is O(1) as it uses a constant amount of extra space.
Optimizations
Here are various methods for optimally performing operations on disjoint sets.
1. Path Compression
Path compression is a technique used in the Union-Find data structure to flatten the structure of the tree whenever the find
operation is called. This optimization significantly reduces the time complexity of the find
operation, making subsequent operations faster.
The idea is to make each node on the path from a node to the root point directly to the root. This way, whenever we perform a find
operation, we not only find the root but also make the structure flatter, which helps in reducing the depth of the trees.
Here’s how path compression works:
- When we call
find
on a node, we recursively traverse its parent until we reach the root. - During this traversal, we make each node on the path point directly to the root.
In the above diagram, we see a tree structure before and after path compression. Let's explain the process with an example.
-
Before Path Compression:
- Consider we have elements connected as a tree structure.
- Suppose we need to find the root of element F. The path is F -> E -> C -> B -> A.
-
During Path Compression:
- As we perform the
find
operation on element F, we recursively move up to the root (element A). - During this traversal, we make each node (F, E, C, and B) point directly to the root (A).
- As we perform the
-
After Path Compression:
- The tree structure is flattened. Now, F, E, C, and B all directly point to the root (A).
- Future
find
operations on any of these nodes will be much faster as they directly point to the root.
Here is the code implementation for the find()
method for the path compression.
This modified Find
operation ensures that every node directly points to the representative of the set, and it makes it easy to find the representative of any two elements.
Complexity Analysis
-
Time Complexity: With path compression, the time complexity of the
find
operation is reduced to O(\log N). This is because path compression flattens the structure of the tree, reducing the depth of the tree significantly. In eachfind
operation, we make each node on the path point directly to the root, which decreases the depth of the tree and ensures that subsequentfind
operations are faster. This effectively reduces the maximum height of the tree to O(\log N). -
Space Complexity: The space complexity remains (O(1)) as
find()
operation doesn't use the extra space.
2. Union by Rank
Union by rank is an optimization technique used in the Union-Find data structure to keep the tree flat and balanced. In the naive approach, the trees can become skewed, resulting in a high time complexity for the find
and union
operations. Union by rank addresses this issue by always attaching the smaller tree under the root of the larger tree, thereby minimizing the height of the resulting tree.
Why Union by Rank is Important:
- Prevents Skewed Trees: In the naive approach, repeated union operations can result in highly unbalanced trees. This can degrade the time complexity of the
find
operation to O(N). - Maintains Balanced Trees: By attaching the smaller tree under the larger tree, union by rank ensures that the resulting tree remains balanced, keeping the operations efficient.
How Rank is Maintained:
- Rank Array: We maintain a
Rank
array whereRank[i]
denotes the rank (or height) of the tree rooted at elementi
. - Union Operation: When performing the union of two sets, we compare the ranks of their roots:
- If the ranks are different, the root with the higher rank becomes the parent.
- If the ranks are the same, we arbitrarily choose one root to be the parent and increment its rank by 1.
In the above diagram, the rank of the s1
tree is 3
, and the s2
tree is 2
. So, the s2
tree is connected with the s1
tree.
This function uses the rank array to decide which tree gets attached under which tree.
Complexity Analysis
- Time Complexity: The time complexity of the union operation with union by rank is O(\log N). This is because union by rank keeps the trees balanced, ensuring that the maximum height of the tree remains logarithmic with respect to the number of elements.
- Space Complexity: O(n), as we use the rank array.
3. Union by Size
Union by size is another optimization technique used in the Union-Find data structure to keep the tree flat and balanced. Similar to union by rank, union by size helps to avoid the creation of skewed trees by always attaching the smaller tree under the larger tree. This keeps the height of the trees small, ensuring that the find
and union
operations remain efficient.
How Size is Maintained:
- Size Array: We maintain a
size
array wheresize[i]
denotes the size (number of elements) of the tree rooted at elementi
. - Union Operation: When performing the union of two sets, we compare the sizes of their roots:
- If the sizes are different, the root of the smaller set becomes a child of the root of the larger set.
- The size of the new root is updated to reflect the total number of elements in the merged set.
This function uses the size array to ensure that the tree with fewer nodes is added under the tree with more nodes.
Complexity Analysis
- Time Complexity: The time complexity of the union operation with union by size is O(\log N). This is because union by size ensures that the trees remain balanced by attaching the smaller tree to the larger one, keeping the tree height logarithmic with respect to the number of elements.
- Space Complexity: O(n), as we use the rank array.
Most Optimized Approach: Combining Path Compression and Union by Rank
In the previous sections, we discussed two important optimizations for the Union-Find data structure: Path Compression and Union by Rank. Each of these optimizations addresses specific issues related to tree depth and efficiency. By combining these two optimizations, we can achieve the most efficient Union-Find implementation.
Combining Path Compression and Union by Rank
-
Path Compression: This technique flattens the tree structure whenever the
find
operation is called. It does so by making each node on the path from a node to the root point directly to the root. This significantly reduces the depth of the tree, ensuring that subsequentfind
operations are faster. -
Union by Rank: This technique ensures that the tree remains balanced during union operations. By always attaching the smaller tree under the root of the larger tree (based on rank), we prevent the tree from becoming too deep. This optimization helps maintain a logarithmic tree height, keeping the operations efficient.
Combining these two optimizations, we get a highly efficient Union-Find structure where both the find
and union
operations are optimized.
Example Code
To implement this combined approach, we maintain two arrays: parent
and rank
. The parent
array keeps track of the parent or representative of each node, while the rank
array helps manage the depth of the trees.
Complexity Analysis of the Most Optimized Approach
By combining Path Compression and Union by Rank, we achieve a highly efficient Union-Find data structure. The amortized time complexity for both find
and union
operations is O(\alpha(n)), where \alpha(n) is the inverse Ackermann function.
Inverse Ackermann Function
The Ackermann function grows extremely fast, and its inverse, \alpha(n), grows very slowly. For practical purposes, \alpha(n) remains nearly constant. Even for very large inputs (e.g., n \leq 10^{600}), (\alpha(n)) does not exceed 5. This means:
\{\alpha(n) = \min \{ k : A_k(1) \geq n \}
Some values of the Ackermann function:
\begin{align*} A_0(1) & = 1 \\ A_1(1) & = 3 \\ A_2(1) & = 7 \\ A_3(1) & = 2047 \\ A_4(1) & \gg 10^{80} \end{align*}
Given this slow growth, the time complexity of the combined approach is effectively constant.
- Time Complexity: The amortized time complexity for
find
andunion
operations is O(\alpha(n)), which is nearly constant. - Space Complexity: The space complexity is O(n) due to the
parent
andrank
arrays.
Combining these optimizations ensures that the Union-Find data structure operates efficiently, making it suitable for handling large datasets with minimal overhead.
Pros and Cons of DSU
- Pros:
- Efficiency: Provides near constant time operations for union and find operations, making it highly efficient.
- Simplicity: Once set up, it’s straightforward to use for solving problems related to disjoint sets.
- Cons:
- Space Overhead: Requires additional space to store the parent and rank of each element.
- Initial Setup: Requires initial setup to create and initialize the data structure.
Why Choose Union-Find Over BFS/DFS?
You can solve similar problems using the Breadth-First Search (BFS) and Depth-First Search (DFS) algorithms, but the Union Find algorithm provides the optimal solution over them. For problems related to connectivity checks and component identification, Union-Find is often more efficient than BFS/DFS.
- Time Complexity:
- BFS/DFS: For BFS and DFS, the time complexity is O(V + E), where V is the number of vertices and E is the number of edges. This can be quite slow for large graphs.
- Union-Find: With path compression and union by rank, the amortized time complexity for union and find operations can be approximated as O(α(n)), where α(n) is the inverse Ackermann function. This function grows very slowly, making Union-Find operations almost constant time in practice.
Let's jump onto our first problem and apply the Union Find pattern.
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible