Accounts Merge medium

Problem Statement

Given a list of accounts where each account contains a list of emails, merge all accounts with the same email address. The accounts' emails must be grouped so that no two accounts share the same email address. Each account's first email is its name, and it is guaranteed that each email only appears in at most two accounts.

For example, input: accounts = [["John", "johnsmith@mail.com", "john00@mail.com"], ["John", "johnnybravo@mail.com"], ["John", "johnsmith@mail.com", "john_newyork@mail.com"], ["Mary", "mary@mail.com"]]

Output: [["John", "john00@mail.com", "johnnybravo@mail.com", "johnsmith@mail.com", "john_newyork@mail.com"], ["Mary", "mary@mail.com"]]

Example 1

Input: accounts = [["John", "johnsmith@mail.com", "john00@mail.com"], ["John", "johnnybravo@mail.com"], ["John", "johnsmith@mail.com", "john_newyork@mail.com"], ["Mary", "mary@mail.com"]] Output: [["John", "john00@mail.com", "johnnybravo@mail.com", "johnsmith@mail.com", "john_newyork@mail.com"], ["Mary", "mary@mail.com"]]

Example 2

Input: accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Jona","Jona0@m.co","Jona1@m.co"],["Gabe","Gabe0@m.co","Gabe2@m.co"]] Output: [["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe2@m.co","Gabe3@m.co"],["Jona","Jona0@m.co","Jona1@m.co"]]

Steps

  1. Create a graph: Use a graph (represented using a dictionary in C#) to connect emails that belong to the same account. The key will be an email address, and the value will be a list of email addresses connected to it.

  2. Perform Depth-First Search (DFS): Traverse the graph using DFS to find connected components. Each connected component represents a merged account.

  3. Merge accounts: For each connected component found in the DFS, collect all emails and construct the merged account list.

  4. Return merged accounts: Return the list of merged accounts.

Explanation

The core idea is to treat the problem as a graph problem. Each email address is a node, and an edge exists between two emails if they appear in the same account. By performing DFS, we can group all connected nodes (emails) together, representing the merged accounts.

The first email in each input account is the name, which is preserved throughout the merging process.

Code

using System;
using System.Collections.Generic;
using System.Linq;

public class Solution {
    public IList<IList<string>> AccountsMerge(IList<IList<string>> accounts) {
        Dictionary<string, int> emailToId = new Dictionary<string, int>();
        Dictionary<int, string> idToName = new Dictionary<int, string>();
        Dictionary<int, List<int>> graph = new Dictionary<int, List<int>>();
        int idCounter = 0;

        // Create the graph
        foreach (var account in accounts) {
            string name = account[0];
            int firstEmailId = -1;
            for (int i = 1; i < account.Count; i++) {
                string email = account[i];
                if (!emailToId.ContainsKey(email)) {
                    emailToId[email] = idCounter;
                    idToName[idCounter] = name;
                    graph[idCounter] = new List<int>();
                    idCounter++;
                }
                int currentEmailId = emailToId[email];
                if (firstEmailId == -1) {
                    firstEmailId = currentEmailId;
                } else {
                    graph[firstEmailId].Add(currentEmailId);
                    graph[currentEmailId].Add(firstEmailId);
                }
            }
        }


        HashSet<int> visited = new HashSet<int>();
        List<IList<string>> result = new List<IList<string>>();

        // Perform DFS to find connected components
        foreach (int i in graph.Keys) {
            if (!visited.Contains(i)) {
                List<string> mergedAccount = new List<string>();
                DFS(graph, i, visited, mergedAccount);
                mergedAccount.Sort();
                mergedAccount.Insert(0, idToName[i]);
                result.Add(mergedAccount);
            }
        }
        return result;
    }

    // Depth-First Search function to find connected components
    private void DFS(Dictionary<int, List<int>> graph, int node, HashSet<int> visited, List<string> mergedAccount) {
        visited.Add(node);
        string email = FindEmail(node);
        mergedAccount.Add(email);
        foreach (int neighbor in graph[node]) {
            if (!visited.Contains(neighbor)) {
                DFS(graph, neighbor, visited, mergedAccount);
            }
        }
    }

    private string FindEmail(int id) {
        foreach(var kvp in emailToId){
            if(kvp.Value == id) return kvp.Key;
        }
        return "";
    }
}

Complexity

  • Time Complexity: O(E + V), where E is the number of edges (connections between emails) and V is the number of vertices (emails). This is dominated by the DFS traversal.
  • Space Complexity: O(E + V), primarily due to the graph data structure. In the worst case, all emails are connected.