Regular Expression Matching hard

Problem Statement

Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*'.

'.' Matches any single character. '*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Example 1

Input: s = "aa", p = "a*" Output: true Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' zero times, it matches the input string "aa".

Example 2

Input: s = "aab", p = "cab" Output: true Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".

Steps

The problem can be solved efficiently using dynamic programming. We create a DP table dp[m+1][n+1], where m is the length of s and n is the length of p. dp[i][j] is true if the first i characters of s match the first j characters of p, and false otherwise.

We iterate through the DP table, filling each cell based on the following rules:

  1. Base Case: dp[0][0] = true (empty string matches empty pattern). dp[i][0] = false for i > 0 (non-empty string can't match an empty pattern). dp[0][j] depends on the pattern: if p[j-1] is '*', then it can match zero occurrences of the preceding character.

  2. Matching a single character: If s[i-1] == p[j-1] or p[j-1] == '.', then dp[i][j] = dp[i-1][j-1]. The current characters match, so we check if the previous substrings matched.

  3. Matching zero or more characters using '*': If p[j-1] == '*', there are two possibilities:

    • The '*' matches zero occurrences of the preceding element: dp[i][j] = dp[i][j-2]
    • The '*' matches one or more occurrences of the preceding element: dp[i][j] = dp[i-1][j] (only if s[i-1] == p[j-2] or p[j-2] == '.')

Explanation

The dynamic programming approach systematically explores all possible matches. By building up the solution from smaller subproblems, we avoid redundant calculations and ensure correctness. The base cases handle the trivial scenarios of empty strings and patterns. The recursive relationships capture the logic of character matching and the special handling of the '*' character.

Code

#include <iostream>
#include <string>
#include <vector>

using namespace std;

bool isMatch(string s, string p) {
    int m = s.length();
    int n = p.length();

    vector<vector<bool>> dp(m + 1, vector<bool>(n + 1, false));
    dp[0][0] = true;

    for (int j = 1; j <= n; j++) {
        if (p[j - 1] == '*') {
            dp[0][j] = dp[0][j - 2];
        }
    }

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (s[i - 1] == p[j - 1] || p[j - 1] == '.') {
                dp[i][j] = dp[i - 1][j - 1];
            } else if (p[j - 1] == '*') {
                dp[i][j] = dp[i][j - 2]; // Match zero occurrences
                if (s[i - 1] == p[j - 2] || p[j - 2] == '.') {
                    dp[i][j] = dp[i][j] || dp[i - 1][j]; // Match one or more occurrences
                }
            }
        }
    }

    return dp[m][n];
}

int main() {
    string s1 = "aa", p1 = "a*";
    string s2 = "aab", p2 = "c*a*b";
    string s3 = "mississippi", p3 = "mis*is*p*.";

    cout << "Example 1: " << isMatch(s1, p1) << endl; // true
    cout << "Example 2: " << isMatch(s2, p2) << endl; // true
    cout << "Example 3: " << isMatch(s3, p3) << endl; //false

    return 0;
}

Complexity

  • Time Complexity: O(mn), where m is the length of s and n is the length of p. We iterate through the DP table once.
  • Space Complexity: O(mn), due to the size of the DP table. This can be optimized to O(n) if we only store the previous row of the DP table.