Grokking Graph Algorithms for Coding Interviews
Ask Author
Back to course home

0% completed

Vote For New Content
Solution: Similar String Groups
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Problem Statement

Two strings, X and Y, are considered similar if they are identical or can be made identical by swapping at most two distinct positions.

For example, "abcd" and "bacd" are similar (swapping at positions 0 and 1), "abcd" and "acbd" are also similar, but "dabc" is not similar to "abcd", "bacd", and "acbd".

Together, these form two connected groups by similarity: {"abcd", "bacd", "acbd"} and {"dabc"}. Notice that "bacd" and "acbd" are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

Given a list of strings, where each string is an anagram of every other string in the list, find the total number of groups of strings.

Examples

Example 1

  • Input: strs = ["rstuv", "uvrst", "rsutv", "urvst", "vutsr"]
  • Expected Output: 3
  • Justification: The groups are ["rstuv", "rsutv"], ["uvrst", "urvst"] and ["vutsr"]. All strings within each group can be made identical by swapping at most two characters.

Example 2

  • Input: strs = ["abcd", "acbd", "bacd", "dabc"]
  • Expected Output: 2
  • Justification: The groups are ["abcd", "acbd", "bacd", ] and ["dabc"]. "abcd", "acbd", and "bacd" can be made identical by swapping at most two characters. "dabc" form another group.

Example 3

  • Input: strs = ["xyz", "xzy", "yxz", "yzx", "zxy", "zyx"]
  • Expected Output: 1
  • Justification: All strings are belongs to the same group.

Constraints:

  • 1 <= strs.length <= 300
  • 1 <= strs[i].length <= 300
  • strs[i] consists of lowercase letters only.
  • All words in strs have the same length and are anagrams of each other.

Solution

To solve this problem, we use the Union-Find algorithm, also known as Disjoint Set Union (DSU). This algorithm helps us efficiently group and find connected components. The main idea is to treat each string as a node in a graph. If two strings are similar, we connect their nodes. By the end, the number of connected components in the graph will be the number of similar string groups.

This approach works well because it efficiently manages the merging and finding of groups using path compression and union by rank, which keeps the operations nearly constant time.

Step-by-step Algorithm

  1. Initialize: Create a parent array where each element is its own parent, and a rank array initialized to zero.

    • parent[i] = i for all i
    • rank[i] = 0 for all i
  2. Find Function: Implement a find function with path compression to find the root parent of a node.

    • If parent[x] is not x, set parent[x] to the result of find(parent[x])
    • Return parent[x]
  3. Union Function: Implement a union function with union by rank to merge two sets.

    • Find the root parents of x and y
    • If the root parents are different, compare their ranks
      • If rank[rootX] is greater than rank[rootY], set parent[rootY] to rootX
      • If rank[rootX] is less than rank[rootY], set parent[rootX] to rootY
      • If ranks are equal, set parent[rootY] to rootX and increment rank[rootX]
  4. Similarity Check: Create a helper function isSimilar to check if two strings are similar by counting character differences.

    • Initialize a counter diff
    • Iterate through the characters of both strings
      • Increment diff for each mismatch
      • If diff exceeds 2, return false
    • Return true if diff is 2 or 0
  5. Group Formation: Iterate over the list of strings, and for each pair (i, j), if they are similar, union their sets.

    • For each i from 0 to n-1
      • For each j from i+1 to n
        • If isSimilar(strs[i], strs[j]), call union(i, j)
  6. Count Groups: Count the number of unique parents to determine the number of groups.

    • Create a set to store unique root parents
    • For each i from 0 to n-1, add find(i) to the set
    • The size of the set is the number of groups

Algorithm Walkthrough

Let's walk through the algorithm using the input strs = ["rstuv", "uvrst", "rsutv", "urvst", "vutsr"].

  1. Initialization:

    • parent = [0, 1, 2, 3, 4]
    • rank = [0, 0, 0, 0, 0]
  2. Union-Find Operations:

    • Compare "rstuv" with each other string:

      • "rstuv" and "uvrst" are not similar.
      • "rstuv" and "rsutv" are similar (swap 'u' with 't') -> union(0, 2)
        • find(0) = 0, find(2) = 2
        • parent[2] = 0
        • parent = [0, 1, 0, 3, 4], rank = [1, 0, 0, 0, 0]
      • "rstuv" and "urvst" are not similar.
      • "rstuv" and "vutsr" are not similar.
    • Compare "uvrst" with each other string:

      • "uvrst" and "rsutv" are not similar.
      • "uvrst" and "urvst" are similar (swap 'v' with 'r') -> union(1, 3)
        • find(1) = 1, find(3) = 3
        • parent[3] = 1
        • parent = [0, 1, 0, 1, 4], rank = [1, 1, 0, 0, 0]
      • "uvrst" and "vutsr" are not similar.
    • Compare "rsutv" with each other string:

      • "rsutv" and "urvst" are not similar.
      • "rsutv" and "vutsr" are not similar.
    • Compare "urvst" with "vutsr":

      • "urvst" and "vutsr" are not similar.
  3. Count Groups:

    • Create a set to store unique root parents.
    • For each i from 0 to 4, add find(i) to the set.
      • find(0) = 0
      • find(1) = 1
      • find(2) = find(0) = 0
      • find(3) = find(1) = 1
      • find(4) = 4
    • The set contains {0, 1, 4}.
    • The number of groups is 3.

Code

Python3
Python3

. . . .

Complexity Analysis

Time Complexity

  1. Initialization: Setting up the parent and rank arrays takes O(n) time.
  2. String Comparisons: Comparing every pair of strings takes O(n^2 \cdot L) time, where (n) is the number of strings and (L) is the length of each string.
  3. Union-Find Operations: Each find and union operation is O(\alpha(n)), nearly constant time. Since there are (O(n^2)) comparisons, the total time is O(n^2 \cdot \alpha(n)).
  4. Counting Groups: Counting unique parents takes O(n) time.

Overall, the dominant factor is string comparisons, resulting in: O(n^2 \cdot L)

Space Complexity

  1. Union-Find Arrays: The parent and rank arrays use O(n) space.

Overall, the space complexity is: O(n)

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible