Solution: Design Add and Search Words Data Structure

Blind 75

Problem Statement

Design a data structure that supports the addition of new words and the ability to check if a string matches any previously added word.

Implement the Solution class:

Solution() Initializes the object.
void addWord(word) Inserts word into the data structure, making it available for future searches.
bool search(word) Checks if there is any word in the data structure that matches word. The method returns true if such a match exists, otherwise returns false.

Note: In the search query word, the character '.' can represent any single letter, effectively serving as a wildcard character.

Examples

Example 1:

Input:

["Solution", "addWord", "addWord", "search", "search"]
[[], ["apple"], ["banana"], ["apple"], ["......"]]

Expected Output:
```
[-1, -1, -1, 1, 1]
```
Justification: After adding the words "apple" and "banana", searching for "apple" will return true since "apple" is in the data structure. Searching for "......" will also return true as "banana" match the pattern.

Example 2:

Input:

["Solution", "addWord", "addWord", "search", "search"]
[[], ["cat"], ["dog"], ["c.t"], ["d..g"]]

Expected Output:
```
[-1, -1, -1, 1, 0]
```
Justification: "c.t" matches "cat" and "d..g" doesn't matches "dog".

Example 3:

Input:

["Solution", "addWord", "search", "search"]
[[], ["hello"], ["h.llo"], ["h...o"]]

Expected Output:
```
[-1, -1, 1, 1]
```
Justification: "h.llo" and "h...o" both match "hello".

Constraints:

1 <= word.length <= 25
word in addWord consists of lowercase English letters.
word in search consist of '.' or lowercase English letters.
There will be at most 2 dots in word for search queries.
At most 10<sup>4</sup> calls will be made to addWord and search.

Solution

The crux of the problem lies in efficiently inserting words and then searching for them, even if the query includes wildcards. To solve this, we utilize the trie (prefix tree) data structure. A trie is a tree-like structure that's useful for storing a dynamic set of strings, especially when the dataset involves large numbers of queries on prefixes of strings. Each node of the trie can represent a character of a word, and the path from the root node to any node represents the word stored up to that point. The key operation for the wildcard is a recursive search, which allows us to explore multiple paths in the trie when we encounter the wildcard character.

1. Trie Data Structure: Every node of the trie contains multiple child nodes (one for each character of the alphabet). We start with a root node that represents an empty string. Each level of the trie represents the next character of a word.

2. Adding a Word: To insert a word into our trie, we begin at the root and traverse down the trie based on the characters in the word. If a particular character doesn't have a corresponding child node in the current node, we create a new child node for that character. Once we've processed every character of the word, we mark the final node as the end of a valid word.

3. Searching: Searching for a word is similar to inserting, but with an additional consideration for the wildcard character ('.'). If we encounter a '.', we must consider all child nodes of the current node and recursively continue our search from each of them. If any of the paths result in a match, we return true. If we reach the end of a word without encountering any mismatches or premature ends, we've found a valid word in our trie.

This trie-based approach ensures efficient operations for both inserting and searching for words. In cases without wildcards, the search operation can be performed in linear time relative to the word's length. However, with wildcards, the time complexity might increase, but the trie structure still ensures that we do this efficiently.

Algorithm Walkthrough

Given the word "apple" to insert and then search for ".....":

Start at the root node.
For inserting "apple":
- At 'a', move down or create a node if it doesn't exist.
- At 'p', move down or create.
- Do the same for the next 'p', 'l', and 'e'.
- Mark the last node (for 'e') as the end of a word.
For searching ".....":
- At the first '.', check all child nodes and continue.
- Repeat for each '.'.
- If any path leads to a node that represents the end of a word, return true.

Code

Python3

Complexity Analysis

Time Complexity:
- Insertion (addWord): O(n), where n is the length of the word. This is because each insertion operation can end up either visiting or creating a new node.
- Search: O(n) if the word does not contain any '.' otherwise, $O(m * 26^n)$ ) in the worst case, where n is the length of the word and m is the total number of nodes in the Trie. The worst case happens when the search word contains all dots (e.g., '.....'), which requires us to search the whole Trie.
Space Complexity:
- Insertion (addWord): O(n), where n is the length of the word. The worst case happens when the word being inserted does not share a prefix in the Trie resulting in creating new nodes.
- Search: O(1) for the words without any '.' and O(n) for words with dots to store the recursion stack.