Longest Common Subsequence medium

Problem Statement

Given two strings text1 and text2, return the length of their longest common subsequence. A subsequence of a string is a new string generated from the original string with some characters(can be none) deleted without changing the relative order of the remaining characters. (eg., "ace" is a subsequence of "abcde"). A common subsequence of two strings is a subsequence that is common to both strings.

If there is no common subsequence, return 0.

Example 1

Input: text1 = "abcde", text2 = "ace" Output: 3
Explanation: The longest common subsequence is "ace" and its length is 3.

Example 2

Input: text1 = "abc", text2 = "abc" Output: 3 Explanation: The longest common subsequence is "abc" and its length is 3.

Steps

The problem can be solved efficiently using dynamic programming. We'll create a 2D array dp where dp[i][j] represents the length of the longest common subsequence of text1[0...i-1] and text2[0...j-1].

  1. Initialization: Create a dp array of size (m+1) x (n+1), where 'm' is the length of text1 and 'n' is the length of text2. Initialize the first row and column to 0.

  2. Iteration: Iterate through the dp array. For each cell dp[i][j]:

    • If text1[i-1] == text2[j-1], it means we found a common character. Therefore, dp[i][j] = dp[i-1][j-1] + 1.
    • Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]). We take the maximum length from either excluding the current character from text1 or text2.
  3. Result: The value at dp[m][n] will be the length of the longest common subsequence.

Explanation

The dynamic programming approach builds up the solution incrementally. Each cell dp[i][j] represents a subproblem: finding the LCS of prefixes of the input strings. By solving these smaller subproblems and storing the results, we avoid redundant computations. The recurrence relation dp[i][j] = max(dp[i-1][j], dp[i][j-1]) handles the case where the characters don't match; we choose the maximum LCS from the possibilities of excluding the current character from either string. If they do match (text1[i-1] == text2[j-1]), we extend the LCS by 1.

Code

#include <iostream>
#include <string>
#include <algorithm>
#include <vector>

using namespace std;

int longestCommonSubsequence(string text1, string text2) {
    int m = text1.length();
    int n = text2.length();

    // Create a DP table to store lengths of LCS of substrings.
    vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));

    // Fill the DP table bottom-up
    for (int i = 1; i <= m; ++i) {
        for (int j = 1; j <= n; ++j) {
            if (text1[i - 1] == text2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            } else {
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    return dp[m][n];
}

int main() {
    string text1 = "abcde";
    string text2 = "ace";
    cout << "Length of LCS: " << longestCommonSubsequence(text1, text2) << endl; // Output: 3

    text1 = "abc";
    text2 = "abc";
    cout << "Length of LCS: " << longestCommonSubsequence(text1, text2) << endl; // Output: 3

    text1 = "AGGTAB";
    text2 = "GXTXAYB";
    cout << "Length of LCS: " << longestCommonSubsequence(text1, text2) << endl; //Output 4

    return 0;
}

Complexity

  • Time Complexity: O(m*n), where 'm' and 'n' are the lengths of the input strings. This is because we iterate through the entire DP table.
  • Space Complexity: O(m*n) to store the DP table. We could optimize this to O(min(m,n)) space by only storing the previous row or column during the iteration, but this would make the code slightly more complex.