Disjoint Sets
- The disjoint set data structure is used to solve the “Dynamic Connectivity” problem
- The problem that we are working with is that we have various items. With a disjoint set, we can do the following:
connect(x, y)
: Connects item x and yisConnected(x,y)
: Returns true if x and y are connected). Connections can be transitive, they do not need to share a direct connection, but they have have a path from one item to another.
- Disjoint Sets are used for kruskals algorithm and also percolation problems
Disjoint Sets on Integers
- We force all items to be integers
- We declare the number of items in advance, everything is initially disconnected.
- Disjoint Sets Interface:
1
2
3
4
5
6
7
public interface DisjointSets {
/* Connects two items P and Q */
void connect(int p, int q);
/* Checks to see if two items are connected */
boolean isConnected(int p, int q);
}
The Naive Approach
- Connecting two things: Record every connection in some data structure.
- Checking connectedness: Do iteration over the lines to see if one item can be reached from the other.
The Better Approach
- We focus on connected components.
- We just record the sets that each item belongs to
- We just care if items are connected, not how they are connected.
Idea 1
- We use a
List<Set<Integer>>
- Intuitive but terrible because this is complicated and slow. We would have to iterate through all the sets to find anything.
Idea 2
- We store a list of integers where the ith entry gives a set number (“id”) of item i.
connect(p, q)
: Change entires that equalid[p]
toid[q]
- This is still too slow, as connecting two items takes O(N) time.
Idea :3
- We need to be able to change our set representation so that combining two sets into their union only requires changing one value.
- We assign each item a parent rather than an id. This creates a tree structure where we may follow connections until we reach a root. All elements in the same set will share the same root.
- Thus, we just need to make sure that items in the same set share the same root.
- Steps, ex: connect(5,2):
- we set root(5)’s value equal to root(2).
- To check if two items are connected, we simply check if the root of one item is equal to the root of another item.
- The issue with this is that we havea worst case tree height of N for N items.
- We can make this more efficient by choosing how to connect two items.
- In fact, we put the item with the smaller height under the root of the item with the bigger height.
- However, tracking with height is tedious, instead we track with the number of elements in a tree, the “weight” of the tree.
- Connecting based on height versus weight have different asymptotic efficiency.
- To store the weight of the tree, we simply initialize another array that keeps track of the size of the tree that the item belongs to.
Weighted Quick Union Performance.
- The only way that we may create a tree of height 2 is if we connect a tree of height 1 with another tree of height 1. Each tree of height 1 has at least 2 items. Thus, the smallest tree of height 2 has 4 items. The smallest tree of height 3 will have 8 items, and so on.
- The height of our tree scales with the log of the input size N.
- Using height minimizes the worst case depth of a node using weight minimizes the average depth of a nodex`