Design Add and Search Words Data Structure medium

Problem Statement

Design a data structure that supports adding new words and finding if a string matches any previously added string.

Implement the WordDictionary class:

  • WordDictionary() Initializes the object.
  • void addWord(word) Adds word to the data structure.
  • bool search(word) Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots '.' where dots can be matched with any letter.

Example 1

Input
["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
Output
[null,null,null,null,false,true,true,true]
Explanation
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("bad");
wordDictionary.addWord("dad");
wordDictionary.addWord("mad");
wordDictionary.search("pad"); // return False
wordDictionary.search("bad"); // return True
wordDictionary.search(".ad"); // return True
wordDictionary.search("b.."); // return True

Example 2

Input
["WordDictionary","addWord","search"]
[[],["a"],[".a"]]
Output
[null,null,false]
Explanation
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("a");
wordDictionary.search(".a"); // return False, only "a" is in the dictionary

Steps

  1. Data Structure: We'll use a Trie (prefix tree) to efficiently store and search words. A Trie node will contain a boolean isWord to mark the end of a word and a vector of children (children) to represent the next characters.

  2. addWord(word): Traverse the Trie, adding nodes as needed for each character in the word. Mark isWord as true for the last node.

  3. search(word): This is the more complex part. We'll recursively traverse the Trie.

    • If the current character is a dot (.), we recursively search through all children.
    • Otherwise, we only proceed if the current character matches the Trie node's character.
    • We reach the end of the word when the index reaches the end of the word string. We return true if isWord is true at that node, indicating a match.

Explanation

The Trie efficiently handles prefix-based searches. The search function's recursive nature elegantly handles the wildcard character (.). By exploring all children when encountering a dot, we exhaustively check all possible matches.

Code

class WordDictionary {
private:
    struct TrieNode {
        bool isWord;
        vector<TrieNode*> children;
        TrieNode() : isWord(false), children(26, nullptr) {}
    };
    TrieNode* root;

public:
    WordDictionary() {
        root = new TrieNode();
    }

    void addWord(string word) {
        TrieNode* node = root;
        for (char c : word) {
            int index = c - 'a';
            if (node->children[index] == nullptr) {
                node->children[index] = new TrieNode();
            }
            node = node->children[index];
        }
        node->isWord = true;
    }

    bool search(string word) {
        return searchHelper(word, 0, root);
    }

    bool searchHelper(string word, int index, TrieNode* node) {
        if (index == word.length()) {
            return node->isWord;
        }
        char c = word[index];
        if (c == '.') {
            for (TrieNode* child : node->children) {
                if (child != nullptr && searchHelper(word, index + 1, child)) {
                    return true;
                }
            }
        } else {
            int charIndex = c - 'a';
            if (node->children[charIndex] != nullptr) {
                return searchHelper(word, index + 1, node->children[charIndex]);
            }
        }
        return false;
    }
};

Complexity

  • Time Complexity:

    • addWord: O(m), where m is the length of the word.
    • search: O(N * 4^m) in the worst case, where N is the number of words in the Trie and m is the length of the search word (4^m comes from exploring all branches when '.' is encountered). In practice, it's often much faster.
  • Space Complexity: O(N * m), where N is the number of words and m is the maximum length of a word. This is due to the space used by the Trie.