Longest Common Subsequence medium

Problem Statement

Given two strings text1 and text2, return the length of their longest common subsequence. A subsequence of a string is a new string generated from the original string with some characters(can be none) deleted without changing the relative order of the remaining characters. (eg., "ace" is a subsequence of "abcde").

A common subsequence of two strings is a subsequence that is common to both strings.

If there is no common subsequence, return 0.

Example 1

  • Input: text1 = "abcde", text2 = "ace"
  • Output: 3
  • Explanation: The longest common subsequence is "ace" and its length is 3.

Example 2

  • Input: text1 = "abc", text2 = "abc"
  • Output: 3
  • Explanation: The longest common subsequence is "abc" and its length is 3.

Example 3

  • Input: text1 = "abc", text2 = "def"
  • Output: 0
  • Explanation: There is no such common subsequence, so the result is 0.

Steps

The problem can be solved efficiently using dynamic programming. We'll create a 2D array dp where dp[i][j] represents the length of the longest common subsequence of text1[0...i-1] and text2[0...j-1].

  1. Initialization: Create a dp array of size (m+1) x (n+1), where m and n are the lengths of text1 and text2 respectively. Initialize the first row and first column to 0.

  2. Iteration: Iterate through the dp array. For each cell dp[i][j]:

    • If text1[i-1] equals text2[j-1], it means we've found a common character. In this case, dp[i][j] will be dp[i-1][j-1] + 1.
    • Otherwise, dp[i][j] will be the maximum of dp[i-1][j] and dp[i][j-1], representing either taking the LCS from the previous row or the previous column.
  3. Result: The value in dp[m][n] will be the length of the longest common subsequence of text1 and text2.

Explanation

The dynamic programming approach builds up the solution from smaller subproblems. Each cell dp[i][j] depends only on the cells to its left (dp[i][j-1]), above (dp[i-1][j]), and diagonally above (dp[i-1][j-1]). This overlapping subproblem property makes dynamic programming suitable. The algorithm efficiently avoids redundant computations.

Code (C#)

public class Solution {
    public int LongestCommonSubsequence(string text1, string text2) {
        int m = text1.Length;
        int n = text2.Length;

        // Create a DP table to store lengths of LCS of substrings.
        // Note: We add an extra row and column to handle base cases (empty substrings).
        int[,] dp = new int[m + 1, n + 1];

        // Initialize the first row and column to 0 (LCS of empty substring with any other substring is 0).
        for (int i = 0; i <= m; i++) {
            dp[i, 0] = 0;
        }
        for (int j = 0; j <= n; j++) {
            dp[0, j] = 0;
        }

        // Iterate through the strings and populate the DP table.
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (text1[i - 1] == text2[j - 1]) {
                    // If characters match, extend the LCS.
                    dp[i, j] = dp[i - 1, j - 1] + 1;
                } else {
                    // If characters don't match, take the maximum from the previous row or column.
                    dp[i, j] = Math.Max(dp[i - 1, j], dp[i, j - 1]);
                }
            }
        }

        // The bottom-right cell contains the length of the LCS.
        return dp[m, n];
    }
}

Complexity

  • Time Complexity: O(m*n), where m and n are the lengths of the input strings. We iterate through the entire DP table once.
  • Space Complexity: O(m*n), due to the size of the DP table. We could optimize this to O(min(m,n)) using space optimization techniques (keeping only the current and previous row).