Design a data structure that allows you to:
- Add a word.
- Search for a word that can include dots ('.') as wildcards, where the dot represents any single character.
In simpler words, this data structure should support two operations: adding a whole word and searching for a word (with or without wildcards).
["Solution", "addWord", "addWord", "search", "search"] [, ["apple"], ["banana"], ["apple"], ["....."]]
- Expected Output:
[-1, -1, -1, 1, 1]
- Justification: After adding the words "apple" and "banana", searching for "apple" will return
truesince "apple" is in the data structure. Searching for "....." will also return
trueas both "apple" and "banana" match the pattern.
["Solution", "addWord", "addWord", "search", "search"] [, ["cat"], ["dog"], ["c.t"], ["d..g"]]
- Expected Output:
[-1, -1, -1, 1, 0]
- Justification: "c.t" matches "cat" and "d..g" doesn't matches "dog".
["Solution", "addWord", "search", "search"] [, ["hello"], ["h.llo"], ["h...o"]]
- Expected Output:
[-1, -1, 1, 0]
- Justification: "h.llo" and "h...o" both match "hello".
The crux of the problem lies in efficiently inserting words and then searching for them, even if the query includes wildcards. To solve this, we utilize the trie (prefix tree) data structure. A trie is a tree-like structure that's useful for storing a dynamic set of strings, especially when the dataset involves large numbers of queries on prefixes of strings. Each node of the trie can represent a character of a word, and the path from the root node to any node represents the word stored up to that point. The key operation for the wildcard is a recursive search, which allows us to explore multiple paths in the trie when we encounter the wildcard character.
1. Trie Data Structure: Every node of the trie contains multiple child nodes (one for each character of the alphabet). We start with a root node that represents an empty string. Each level of the trie represents the next character of a word.
2. Adding a Word: To insert a word into our trie, we begin at the root and traverse down the trie based on the characters in the word. If a particular character doesn't have a corresponding child node in the current node, we create a new child node for that character. Once we've processed every character of the word, we mark the final node as the end of a valid word.
3. Searching: Searching for a word is similar to inserting, but with an additional consideration for the wildcard character ('.'). If we encounter a '.', we must consider all child nodes of the current node and recursively continue our search from each of them. If any of the paths result in a match, we return
true. If we reach the end of a word without encountering any mismatches or premature ends, we've found a valid word in our trie.
This trie-based approach ensures efficient operations for both inserting and searching for words. In cases without wildcards, the search operation can be performed in linear time relative to the word's length. However, with wildcards, the time complexity might increase, but the trie structure still ensures that we do this efficiently.
Given the word "apple" to insert and then search for ".....":
- Start at the root node.
- For inserting "apple":
- At 'a', move down or create a node if it doesn't exist.
- At 'p', move down or create.
- Do the same for the next 'p', 'l', and 'e'.
- Mark the last node (for 'e') as the end of a word.
- For searching ".....":
- At the first '.', check all child nodes and continue.
- Repeat for each '.'.
- If any path leads to a node that represents the end of a word, return
- Time Complexity:
- Insertion (addWord): O(n), where n is the length of the word. This is because each insertion operation runs in linear time with respect to the length of the word.
- Search: O(n * m) in the worst case, where n is the length of the word and m is the total number of nodes in the Trie. This happens when the search word contains dots ('.'). However, for words without dots, the search is O(n).
- Space Complexity: O(m * n), where m is the total number of Trie nodes and n is the average number of characters in the words. Each Trie node has up to 26 children (for each letter of the alphabet). In the worst case, when no nodes are shared, the space complexity is O(m * n).