Find All Anagrams in a String medium

Problem Statement

Given two strings s and p, return an array of all the start indices of p's anagrams in s. You may return the answer in any order.

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Example 1:

Input: s = "cbaebabacd", p = "abc" Output: [0,6] Explanation: The substring with start index = 0 is "cba", which is an anagram of "abc". The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab" Output: [0,1,2] Explanation: The substring with start index = 0 is "ab", which is an anagram of "ab". The substring with start index = 1 is "ba", which is an anagram of "ab". The substring with start index = 2 is "ab", which is an anagram of "ab".

Steps:

  1. Character Count for p: Create a dictionary (or a similar data structure) to store the frequency of each character in the string p.

  2. Sliding Window: Use a sliding window of size equal to the length of p to traverse s.

  3. Character Count for Window: For each window, create a dictionary to store the frequency of characters within that window.

  4. Comparison: Compare the frequency dictionaries of p and the current window. If they are identical, it means the window contains an anagram of p. Add the start index of the window to the result list.

  5. Sliding: Move the window one position to the right. Efficiently update the character counts in the window dictionary instead of recalculating for each window.

Explanation

The core idea is to use a sliding window technique coupled with efficient character counting. Instead of repeatedly counting characters for every substring, we update the counts incrementally as the window slides. This drastically reduces the time complexity. We use dictionaries to store character frequencies because dictionaries provide efficient O(1) average-case lookup, insertion, and deletion.

Code

from collections import Counter

def findAnagrams(s: str, p: str) -> list[int]:
    """
    Finds all anagrams of p in s using a sliding window and character counting.
    """
    ns, np = len(s), len(p)
    if ns < np:
        return []

    p_count = Counter(p)  # Count character frequencies in p
    s_count = Counter()   # Count character frequencies in the sliding window

    output = []
    # sliding window on the string s
    for i in range(ns):
        s_count[s[i]] += 1  # Add character to the window

        # remove the leftmost character if the window size exceeds that of p
        if i >= np:
            if s_count[s[i - np]] == 1:
                del s_count[s[i - np]]
            else:
                s_count[s[i - np]] -= 1

        # compare character counts of window and p
        if p_count == s_count:
            output.append(i - np + 1)

    return output

Complexity

  • Time Complexity: O(N), where N is the length of string s. We iterate through s only once. Dictionary operations (lookup, insertion, deletion) are on average O(1).

  • Space Complexity: O(1). The space used by the Counter dictionaries is bounded by the size of the alphabet (26 for lowercase English letters), which is constant. The output list can have at most O(N) elements in the worst case (all substrings are anagrams), but this is still considered O(1) relative to the input size.