Solution: Problem Challenge 4: Words Concatenation

Grokking the Coding Interview: Patterns for Coding Questions

Ask Author

Back to course home

0% completed

Vote For New Content

Solution: Problem Challenge 4: Words Concatenation

Table of Contents

Contents are not accessible

Problem Statement

You’re given a string s and a list of words words, where all words have the same length.

A concatenated substring is formed by joining all the words from any permutation of words — each used exactly once, without any extra characters in between.

For example, if words = ["ab", "cd", "ef"], then valid concatenated strings include "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab". A string like "acdbef" is not valid because it doesn't match any complete permutation of the given words.

Return all starting indices in s where such concatenated substrings appear. You can return the indices in any order.

Example 1:

Input: String="catfoxcat", Words=["cat", "fox"]  
Output: [0, 3]  
Explanation: The two substring containing both the words are "catfox" & "foxcat".

Example 2:

Input: String="catcatfoxfox", Words=["cat", "fox"]  
Output: [3]
Explanation: The only substring containing both the words is "catfox".

Constraints:

1 <= words.length <= 10<sup>4</sup>
1 <= words[i].length <= 30
words[i] consists of only lowercase English letters.
All the strings of words are unique.
1 <= sum(words[i].length) <= 10<sup>5</sup>

Solution

To solve this problem, we need to check every possible starting position in the string where the concatenation of all given words might begin. We will use a sliding window approach. This approach involves creating a window of the size equal to the total length of all words combined and sliding it across the string, one character at a time. For each position of the window, we will check if it contains a valid concatenation of the given words. This method is effective because it reduces the number of unnecessary checks by focusing only on the relevant parts of the string.

This approach is effective because it systematically checks all possible positions in the string, ensuring no potential concatenation is missed. By using a hashmap to store the frequency of each word in the list and another hashmap to track the words seen in the current window, we can efficiently validate the concatenations.

Step-by-step Algorithm

Initialize Data Structures:
- Create a hashmap to store the frequency of each word in the list.
- Create a list to store the starting indices of valid concatenations.
Set Up Variables:
- Calculate the total number of words (wordsCount).
- Determine the length of each word (wordLength).
Fill Frequency Map:
- Iterate through the list of words and populate the frequency hashmap with the count of each word.
Slide the Window:
- Iterate over the string, adjusting the starting position of the window from 0 to str.length() - wordsCount * wordLength.
Check Each Window:
- For each starting position i, create a hashmap (wordsSeen) to track the words seen in the current window.
- Within the window, iterate through the words:
  - Calculate the starting index of the next word in the window (nextWordIndex).
  - Extract the word from the string using substring.
  - Check if the word exists in the frequency hashmap:
    - If it does not exist, break out of the inner loop.
    - If it exists, update the wordsSeen hashmap with the count of the word seen.
    - If the count of the word in wordsSeen exceeds the count in the frequency hashmap, break out of the inner loop.
  - If all words are matched perfectly, add the starting index i to the result list.
Return Results:
- Return the list of starting indices where valid concatenations are found.

Algorithm Walkthrough

Using the example:

Input:
- String = "catfoxcat"
- Words = ["cat", "fox"]
Steps:

Initialize:
- wordFrequencyMap: { "cat": 1, "fox": 1 }
- resultIndices: []
- wordsCount = 2
- wordLength = 3
- totalLength = 2 * 3 = 6
Sliding Window:
- Iterate from i = 0 to i = str.length() - totalLength = 3
- i = 0:
  - Window: "catfox"
  - wordsSeen = {}
  - j = 0:
    - nextWordIndex = 0
    - Extracted Word: "cat"
    - Update wordsSeen = { "cat": 1 }
  - j = 1:
    - nextWordIndex = 3
    - Extracted Word: "fox"
    - Update wordsSeen = { "cat": 1, "fox": 1 }
  - All words matched, add index 0 to result list.
  - resultIndices: [0]
- i = 1:
  - Window: "atfoxc"
  - wordsSeen = {}
  - j = 0:
    - nextWordIndex = 1
    - Extracted Word: "atf"
    - "atf" is not in the wordFrequencyMap, break.
  - resultIndices remains [0]
- i = 2:
  - Window: "tfoxca"
  - wordsSeen = {}
  - j = 0:
    - nextWordIndex = 2
    - Extracted Word: "tfo"
    - "tfo" is not in the wordFrequencyMapp, break.
  - resultIndices remains [0]
- i = 3:
  - Window: "foxcat"
  - wordsSeen = {}
  - j = 0:
    - nextWordIndex = 3
    - Extracted Word: "fox"
    - Update wordsSeen = { "fox": 1 }
  - j = 1:
    - nextWordIndex = 6
    - Extracted Word: "cat"
    - Update wordsSeen = { "fox": 1, "cat": 1 }
  - All words matched, add index 3 to result list.
  - resultIndices: [0, 3]

Output:
- [0, 3]

Code

Here is what our algorithm will look like:

Python3

. . . .

Complexity Analysis

Time Complexity

Building the frequency map: The algorithm first builds a frequency map for the given words. This step takes $O(M)$ time, where M is the total number of words in the words array.
Sliding window traversal: The outer loop iterates through the input string str up to the point where there are enough characters left to accommodate the concatenation of all the words. This loop runs for $O(N - M \times L)$ iterations, where:
- N is the length of the string str.
- M is the number of words.
- L is the length of each word (all words are assumed to have the same length).
Word matching: For each starting index in the string, the inner loop checks for a valid sequence of words. This loop runs for $M$ iterations, where each iteration takes $O(L)$ time to extract a word from the string and update the wordsSeen map. So the total cost of word matching for each starting index is $O(M \times L)$ .
Therefore, the overall time complexity is:
- Outer loop: $O(N - M \times L)$
- Inner loop: $O(M \times L)$
- Total time complexity: $O((N - M \times L) \times M \times L) = O(N \times M \times L)$ .

Overall time complexity: $O(N \times M \times L)$ , where N is the length of the string, M is the number of words, and L is the length of each word.

Space Complexity

HashMaps: The algorithm uses two hash maps:
1. wordFrequencyMap: This map stores the frequency of each word from the words array. It contains at most M entries, so the space complexity for this map is $O(M)$ .
2. wordsSeen: This map stores the frequency of the words as they are found in the current window of the string. The space complexity for this map is also $O(M)$ .
Result list: The result list stores the starting indices of the valid concatenations. In the worst case, all possible starting positions in the string could lead to valid concatenations. Hence, this list could have up to $O(N)$ entries in the worst case.

Overall space complexity: $O(M + N)$ .

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible