Regular Expression Matching hard

Problem Statement

Given a string s and a string p which represents a regular expression, implement a function to check if s matches p. The following are the matching rules:

  • . matches any single character.
  • * matches zero or more occurrences of the preceding element.

The matching should cover the entire input string (s).

Example 1

Input: s = "aa", p = "a*" Output: true Explanation: '' matches zero occurrences of 'a', thus "aa" matches "a".

Example 2

Input: s = "aab", p = "cab" Output: true Explanation: 'c*' matches zero occurrences of 'c', 'a*' matches two occurrences of 'a', and 'b' matches 'b'.

Steps

The problem can be solved efficiently using dynamic programming. We'll create a DP table dp where dp[i][j] represents whether the first i characters of s match the first j characters of p.

  1. Initialization:

    • dp[0][0] = true (empty string matches empty pattern)
    • dp[i][0] = false for i > 0 (non-empty string can't match empty pattern)
    • dp[0][j] needs special handling for the pattern p. If p starts with a*, b*, etc., it might match the empty string. We handle this recursively.
  2. Iteration: For each i and j, we consider several cases:

    • If p[j-1] is a literal character and it matches s[i-1], then dp[i][j] = dp[i-1][j-1] (matching the previous characters).
    • If p[j-1] is '.', it matches any character, so dp[i][j] = dp[i-1][j-1].
    • If p[j-1] is '*', we have two possibilities:
      • The '' matches zero occurrences of the preceding element. In this case, dp[i][j] = dp[i][j-2] (ignore the '' and the preceding element).
      • The '*' matches one or more occurrences of the preceding element. In this case, we need p[j-2] to match s[i-1] (or '.' to match anything), and dp[i][j] = dp[i-1][j].
  3. Result: dp[s.length()][p.length()] will indicate whether the entire string s matches the entire pattern p.

Explanation

The dynamic programming approach breaks down the problem into smaller subproblems. By building the dp table bottom-up, we avoid redundant calculations and find the solution efficiently. The different cases within the iteration handle the various scenarios presented by literal characters, '.', and ''. The key is understanding how the '' operator affects the matching process—it can either match zero occurrences or one or more occurrences, requiring us to explore both possibilities.

Code

class Solution {
    public boolean isMatch(String s, String p) {
        int m = s.length();
        int n = p.length();

        boolean[][] dp = new boolean[m + 1][n + 1];
        dp[0][0] = true;

        for (int j = 1; j <= n; j++) {
            if (p.charAt(j - 1) == '*') {
                dp[0][j] = dp[0][j - 2];
            }
        }

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (p.charAt(j - 1) == '.' || p.charAt(j - 1) == s.charAt(i - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else if (p.charAt(j - 1) == '*') {
                    dp[i][j] = dp[i][j - 2]; // Match zero occurrences
                    if (p.charAt(j - 2) == '.' || p.charAt(j - 2) == s.charAt(i - 1)) {
                        dp[i][j] = dp[i][j] || dp[i - 1][j]; // Match one or more occurrences
                    }
                }
            }
        }

        return dp[m][n];
    }
}

Complexity

  • Time Complexity: O(mn), where m is the length of string s and n is the length of pattern p. This is because we iterate through the DP table of size (m+1) x (n+1).
  • Space Complexity: O(mn), due to the DP table we use. We could potentially optimize this to O(min(m,n)) by only storing the current and previous rows of the DP table.