Encode and Decode Strings medium

Problem Statement

Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.

Please note: The encoded string on the network is limited to a length of 10000.

Example 1:

Input: ["hello","world"] Output: ["hello","world"] Explanation: "hello/world" is not valid because the delimiter '/' could be part of the strings.

Example 2:

Input: ["we","say","yes"] Output: ["we","say","yes"]

Steps:

  1. Encoding:

    • We'll use a length-prefixed encoding scheme. Each string will be prepended with its length (as a 4-byte integer). This avoids the need for delimiters that might clash with characters within the strings themselves.
  2. Decoding:

    • We'll read the encoded string 4 bytes at a time to extract the length of each string.
    • Then, we'll read that many bytes to get the string itself.
    • We'll repeat this process until the end of the encoded string.

Explanation:

The key to solving this problem efficiently and robustly is to avoid relying on delimiters that might appear within the strings. Using a length-prefix guarantees that we can correctly parse the encoded string regardless of its contents. The 4-byte integer provides enough space to encode string lengths up to a substantial size, sufficient for most practical scenarios.

Code (C++)

#include <iostream>
#include <vector>
#include <string>

using namespace std;

class Codec {
public:
    // Encodes a list of strings to a single string.
    string encode(vector<string>& strs) {
        string encodedStr = "";
        for (const string& str : strs) {
            int len = str.length();
            // Convert length to 4-byte integer representation
            encodedStr += string((char*)&len, sizeof(len)); 
            encodedStr += str;
        }
        return encodedStr;
    }

    // Decodes a single string to a list of strings.
    vector<string> decode(string s) {
        vector<string> decodedStrs;
        int i = 0;
        while (i < s.length()) {
            // Extract length (4 bytes)
            int len;
            memcpy(&len, s.c_str() + i, 4); 
            i += 4;

            // Extract string
            decodedStrs.push_back(s.substr(i, len));
            i += len;
        }
        return decodedStrs;
    }
};


int main() {
    Codec codec;
    vector<string> strs = {"hello", "world"};
    string encoded = codec.encode(strs);
    vector<string> decoded = codec.decode(encoded);

    for (const string& str : decoded) {
        cout << str << " ";
    }
    cout << endl; // Output: hello world

    vector<string> strs2 = {"we", "say", "yes"};
    encoded = codec.encode(strs2);
    decoded = codec.decode(encoded);
     for (const string& str : decoded) {
        cout << str << " ";
    }
    cout << endl; // Output: we say yes
    return 0;
}

Complexity Analysis:

  • Time Complexity:

    • Encoding: O(N), where N is the total number of characters in all strings.
    • Decoding: O(N)
  • Space Complexity:

    • Encoding: O(N) (to store the encoded string)
    • Decoding: O(N) (to store the decoded strings)

The space complexity is dominated by the size of the input and output strings. The algorithm itself uses a constant amount of extra space beyond that.