Regular Expression Matching hard

Problem Statement

Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*'.

'.' Matches any single character. '*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Example 1

Input: s = "aa", p = "a" Output: false Explanation: "a" does not match the entire string "aa".

Example 2

Input: s = "aa", p = "a*" Output: true Explanation: '' means zero or more of the preceding element, 'a'. Therefore, by taking 'a' as zero 'a', it matches the entire string "aa".

Steps

The problem can be solved using dynamic programming. We create a DP table dp where dp[i][j] is True if the first i characters of s match the first j characters of p, and False otherwise.

We iterate through the table, filling it based on the following rules:

  1. Base Case: dp[0][0] = True (empty string matches empty pattern).
  2. Pattern starts with '*':
    • If the previous character in the pattern matches the current character in the string, or the previous character is '.', then dp[i][j] is True if either the previous pattern character matches the string (ignoring the '*') or the current pattern matches the string. This represents zero or more occurrences of the preceding element.
    • Otherwise, dp[i][j] is True only if dp[i][j-2] is True (meaning zero occurrences of the preceding character).
  3. Pattern does not start with '*':
    • If the current characters of s and p match (or p is '.'), then dp[i][j] is True if dp[i-1][j-1] is True.
    • Otherwise, dp[i][j] is False.

Explanation

The dynamic programming approach builds up a solution from smaller subproblems. Each cell in the dp table represents a subproblem: does a prefix of s match a prefix of p? By systematically filling the table based on the rules above, we can efficiently determine whether the entire string s matches the entire pattern p. The rules handle the special cases of '.' and '*' correctly.

Code

def isMatch(s, p):
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True

    for j in range(1, n + 1):
        if p[j - 1] == '*':
            dp[0][j] = dp[0][j - 2]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j - 1] == '*':
                dp[i][j] = dp[i][j - 2]  # Zero occurrences
                if p[j - 2] == s[i - 1] or p[j - 2] == '.':
                    dp[i][j] = dp[i][j] or dp[i - 1][j]  # One or more occurrences
            elif p[j - 1] == '.' or p[j - 1] == s[i - 1]:
                dp[i][j] = dp[i - 1][j - 1]

    return dp[m][n]

Complexity

  • Time Complexity: O(mn), where m is the length of s and n is the length of p. We iterate through the entire DP table.
  • Space Complexity: O(mn), due to the size of the DP table. We can optimize space to O(min(m,n)) by using only two rows of the DP table.