Longest Common Subsequence medium

Problem Statement

Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters. For example, "ace" is a subsequence of "abcde".

A common subsequence of two strings is a subsequence that is common to both strings.

Example 1:

Input: text1 = "abcde", text2 = "ace" Output: 3
Explanation: The longest common subsequence is "ace" and its length is 3.

Example 2:

Input: text1 = "abc", text2 = "abc" Output: 3 Explanation: The longest common subsequence is "abc" and its length is 3.

Steps to Solve

The problem can be efficiently solved using dynamic programming. We'll create a 2D array (matrix) dp where dp[i][j] represents the length of the longest common subsequence of text1[:i+1] and text2[:j+1].

  1. Initialization: Create a dp matrix of size (len(text1)+1) x (len(text2)+1) and initialize all elements to 0. The extra row and column are for handling base cases (empty subsequences).

  2. Iteration: Iterate through the dp matrix. For each cell dp[i][j]:

    • If text1[i-1] == text2[j-1], it means we found a common character. In this case, dp[i][j] = dp[i-1][j-1] + 1. We extend the LCS by one.
    • Otherwise, text1[i-1] and text2[j-1] are different. The LCS length is the maximum of the LCS lengths ending at dp[i-1][j] (excluding text1[i-1]) and dp[i][j-1] (excluding text2[j-1]). So, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
  3. Result: The bottom-right cell dp[len(text1)][len(text2)] will contain the length of the longest common subsequence of text1 and text2.

Explanation

The dynamic programming approach systematically builds up the solution from smaller subproblems. Each cell dp[i][j] depends only on the values of its top, left, and top-left neighbors. This overlapping subproblems property makes dynamic programming highly efficient. The base cases (first row and column of dp) represent the LCS lengths when one of the strings is empty (which is always 0).

Code

def longestCommonSubsequence(text1, text2):
    m, n = len(text1), len(text2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if text1[i - 1] == text2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

Complexity

  • Time Complexity: O(m*n), where 'm' and 'n' are the lengths of text1 and text2 respectively. This is because we iterate through the entire dp matrix.
  • Space Complexity: O(m*n) to store the dp matrix. We could optimize this to O(min(m,n)) space by using only two rows of the dp matrix.