Edit Distance medium

Problem Statement

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

Example 1:

Input: word1 = "horse", word2 = "ros" Output: 3 Explanation: horse -> rorse (replace 'h' with 'r') rorse -> rose (remove 'r') rose -> ros (remove 'e')

Example 2:

Input: word1 = "intention", word2 = "execution" Output: 5 Explanation: intention -> inention (remove 't') inention -> enention (replace 'i' with 'e') enention -> exention (replace 'n' with 'x') exention -> exection (replace 'n' with 'c') exection -> execution (insert 'u')

Steps

The problem can be solved efficiently using dynamic programming. We'll create a DP table dp where dp[i][j] represents the minimum edit distance between the first i characters of word1 and the first j characters of word2.

  1. Initialization:

    • dp[i][0] = i for all i (because we need i deletions to transform word1[0...i] to an empty string).
    • dp[0][j] = j for all j (because we need j insertions to transform an empty string to word2[0...j]).
  2. Iteration:

    • Iterate through the DP table, calculating dp[i][j] based on the following recursive relation:
      • If word1[i-1] == word2[j-1]: dp[i][j] = dp[i-1][j-1] (no operation needed)
      • Otherwise: dp[i][j] = min(dp[i-1][j] + 1, dp[i][j-1] + 1, dp[i-1][j-1] + 1) (deletion, insertion, or replacement)
  3. Result:

    • dp[m][n] (where m and n are the lengths of word1 and word2 respectively) will contain the minimum edit distance.

Explanation

The dynamic programming approach cleverly avoids redundant calculations by storing the results of subproblems. Each cell in the dp table represents a subproblem: finding the minimum edit distance between prefixes of the two input strings. By building up the table from smaller subproblems to larger ones, we efficiently find the solution for the overall problem. The recursive relation ensures we consider all possible operations (insert, delete, replace) at each step, choosing the minimum cost.

Code

#include <iostream>
#include <string>
#include <algorithm>
#include <vector>

using namespace std;

int minDistance(string word1, string word2) {
    int m = word1.length();
    int n = word2.length();

    // Create DP table and initialize
    vector<vector<int>> dp(m + 1, vector<int>(n + 1));
    for (int i = 0; i <= m; i++) {
        dp[i][0] = i;
    }
    for (int j = 0; j <= n; j++) {
        dp[0][j] = j;
    }

    // Iterate and fill the DP table
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (word1[i - 1] == word2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = min({dp[i - 1][j] + 1, dp[i][j - 1] + 1, dp[i - 1][j - 1] + 1});
            }
        }
    }

    return dp[m][n];
}

int main() {
    string word1 = "horse";
    string word2 = "ros";
    cout << minDistance(word1, word2) << endl; // Output: 3

    word1 = "intention";
    word2 = "execution";
    cout << minDistance(word1, word2) << endl; // Output: 5
    return 0;
}

Complexity

  • Time Complexity: O(m*n), where m and n are the lengths of word1 and word2. This is due to the nested loops iterating through the DP table.
  • Space Complexity: O(m*n) to store the DP table. The space complexity can be optimized to O(min(m, n)) by using only two rows of the DP table.