Accounts Merge medium

Problem Statement

Given a list of accounts where each account contains a list of emails, merge all accounts with the same email address. The accounts are given in the format [[email1, email2, ...], [email3, email4, ...], ...]. The result should be a list of accounts where each account contains only unique emails and is sorted in lexicographical order, with the emails sorted lexicographically within the account. The first email in each account is the name of the account.

Example 1

Input: accounts = [["John","johnsmith@mail.com","john00@mail.com"],["John","johnbravo@mail.com"],["John","johnbravo@mail.com","john00@mail.com"]]

Output: [["John","john00@mail.com","johnbravo@mail.com","johnsmith@mail.com"]]

Explanation: There are three accounts for John, and all of them contain some common emails. We should merge them into one account. The first email is the account's name. The remaining emails are sorted lexicographically.

Example 2

Input: accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Jona","Jona0@m.co","Jona1@m.co"],["Gabe","Gabe0@m.co","Gabe2@m.co"]]

Output: [["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe2@m.co","Gabe3@m.co"],["Jona","Jona0@m.co","Jona1@m.co"]]

Steps

  1. Create a graph: We'll use a graph to represent the relationships between email addresses. Each email address will be a node in the graph. If two emails are in the same account, we'll add an edge between them.

  2. Perform Depth-First Search (DFS): We'll traverse the graph using DFS to find all connected components. Each connected component represents a group of emails belonging to the same account.

  3. Merge Accounts: For each connected component, we'll merge the emails into a single account, sorting the emails lexicographically.

  4. Format Output: We'll format the output according to the problem's specifications (first email is the account name).

Explanation

The core idea is to use a graph to represent the relationships between emails. If two emails share an account, they are connected. By finding connected components in this graph, we can identify all emails belonging to the same user. DFS efficiently explores all connected nodes.

Code

import java.util.*;

public class AccountsMerge {
    public List<List<String>> accountsMerge(List<List<String>> accounts) {
        Map<String, String> emailToName = new HashMap<>();
        Map<String, Set<String>> graph = new HashMap<>();

        // Build the graph
        for (List<String> account : accounts) {
            String name = account.get(0);
            for (int i = 1; i < account.size(); i++) {
                String email = account.get(i);
                emailToName.put(email, name); // Map email to account name
                graph.computeIfAbsent(email, k -> new HashSet<>()).add(account.get(1)); // Add edges, only to the first email in account (other edges are implicitly formed by other emails in the account being connected)
                if(i > 1) graph.computeIfAbsent(account.get(1), k -> new HashSet<>()).add(email); // Added this line as well to fully represent relationships of all other emails.
            }
        }

        Set<String> visited = new HashSet<>();
        List<List<String>> result = new ArrayList<>();

        // DFS to find connected components
        for (String email : emailToName.keySet()) {
            if (!visited.contains(email)) {
                List<String> component = new ArrayList<>();
                dfs(graph, email, visited, component);
                Collections.sort(component);
                component.add(0, emailToName.get(email));  // Add the name at the beginning
                result.add(component);
            }
        }

        return result;
    }

    private void dfs(Map<String, Set<String>> graph, String email, Set<String> visited, List<String> component) {
        visited.add(email);
        component.add(email);
        for (String neighbor : graph.getOrDefault(email, new HashSet<>())) {
            if (!visited.contains(neighbor)) {
                dfs(graph, neighbor, visited, component);
            }
        }
    }
}

Complexity

  • Time Complexity: O(N log N + M), where N is the total number of emails and M is the total number of accounts. The log N factor comes from sorting the emails within each account. The graph construction and DFS traversal takes O(N + M) time.

  • Space Complexity: O(N + M), due to the graph and the visited set.