Edit Distance medium

Problem Statement

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

Example 1:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation: 
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2:

Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Steps and Explanation

This problem can be solved efficiently using dynamic programming. We create a DP table dp where dp[i][j] represents the minimum edit distance between the first i characters of word1 and the first j characters of word2.

  1. Initialization:

    • dp[i][0] = i for all i, because to convert a string of length i to an empty string, we need i deletions.
    • dp[0][j] = j for all j, because to convert an empty string to a string of length j, we need j insertions.
    • dp[0][0] = 0, because the edit distance between two empty strings is 0.
  2. Iteration:

    • We iterate through the DP table, calculating dp[i][j] for each cell.
    • There are three possibilities to consider:
      • Replace: If word1[i-1] == word2[j-1], no operation is needed, so dp[i][j] = dp[i-1][j-1].
      • Replace: If word1[i-1] != word2[j-1], we need a replacement, so dp[i][j] = dp[i-1][j-1] + 1.
      • Insert: We can insert word2[j-1] into word1, so dp[i][j] = dp[i][j-1] + 1.
      • Delete: We can delete word1[i-1] from word1, so dp[i][j] = dp[i-1][j] + 1.
    • We take the minimum of these three possibilities to find the minimum edit distance.
  3. Result:

    • dp[m][n] (where m and n are the lengths of word1 and word2 respectively) will contain the minimum edit distance between word1 and word2.

Code

def minDistance(word1, word2):
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1][j], dp[i][j - 1]) + 1

    return dp[m][n]

Complexity

  • Time Complexity: O(m*n), where m and n are the lengths of word1 and word2 respectively. This is due to the nested loops iterating through the DP table.
  • Space Complexity: O(m*n), due to the DP table used to store intermediate results. This could be optimized to O(min(m,n)) using space optimization techniques but this adds complexity to the code.