Find All Anagrams in a String medium

Problem Statement

Given two strings s and p, return an array of all the start indices of p's anagrams in s. You may return the answer in any order.

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Example 1:

Input: s = "cbaebabacd", p = "abc" Output: [0,6] Explanation: The substring with start index = 0 is "cba", which is an anagram of "abc". The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab" Output: [0,1,2] Explanation: The substring with start index = 0 is "ab", which is an anagram of "ab". The substring with start index = 1 is "ba", which is an anagram of "ab". The substring with start index = 2 is "ab", which is an anagram of "ab".

Steps:

  1. Create Frequency Maps: Create frequency maps (using unordered_map) for string p to store the frequency of each character.
  2. Sliding Window: Use a sliding window of the same size as p to traverse s.
  3. Compare Frequencies: For each window, create a frequency map for the current window in s. Compare this window's frequency map with p's frequency map. If they are identical, it's an anagram.
  4. Store Indices: If an anagram is found, store its starting index in the result vector.
  5. Optimize with Sliding Window: Instead of creating a new frequency map for each window, efficiently update the frequency map by subtracting the leftmost character and adding the rightmost character as the window slides.

Explanation:

The solution uses a sliding window technique to efficiently check for anagrams. Instead of repeatedly creating frequency maps for every substring, we update a single frequency map as the window moves. This significantly reduces the time complexity. We use unordered_map for efficient character counting because its average lookup time is O(1).

Code:

#include <iostream>
#include <vector>
#include <unordered_map>

using namespace std;

vector<int> findAnagrams(string s, string p) {
    vector<int> result;
    if (s.length() < p.length()) return result; //Handle cases where p is longer than s

    unordered_map<char, int> p_freq;
    for (char c : p) {
        p_freq[c]++;
    }

    unordered_map<char, int> window_freq;
    int window_start = 0;
    int window_end = 0;
    int matched = 0;

    while (window_end < s.length()) {
        char right_char = s[window_end];
        window_freq[right_char]++;
        if (p_freq.count(right_char) && window_freq[right_char] == p_freq[right_char]) {
            matched++;
        }

        while (matched == p_freq.size()) {
            if (window_end - window_start + 1 == p.length()) {
                result.push_back(window_start);
            }
            char left_char = s[window_start];
            window_freq[left_char]--;
            if (p_freq.count(left_char) && window_freq[left_char] < p_freq[left_char]) {
                matched--;
            }
            window_start++;
        }
        window_end++;
    }

    return result;
}


int main() {
    string s = "cbaebabacd";
    string p = "abc";
    vector<int> indices = findAnagrams(s, p);
    cout << "[";
    for (int i = 0; i < indices.size(); ++i) {
        cout << indices[i] << (i == indices.size() - 1 ? "" : ",");
    }
    cout << "]" << endl; // Output: [0,6]

    s = "abab";
    p = "ab";
    indices = findAnagrams(s,p);
    cout << "[";
    for (int i = 0; i < indices.size(); ++i) {
        cout << indices[i] << (i == indices.size() - 1 ? "" : ",");
    }
    cout << "]" << endl; //Output: [0,1,2]

    return 0;
}

Complexity:

  • Time Complexity: O(n), where n is the length of string s. The sliding window traverses s only once.
  • Space Complexity: O(1). The space used by the frequency maps is constant because the number of unique characters in the alphabet is fixed. (Technically, it could be O(k) where k is the size of the character set, but this is usually considered constant in the context of character strings).