Alien Dictionary hard
Problem Statement
There is a dictionary containing words from an alien language. Each word is a string of lowercase letters. The order of the letters in this language is unknown, but we have access to a list of words from this dictionary. We can deduce the order of letters by comparing adjacent words in the list. If word1 comes before word2 in the list, then for the first position where they differ, the letter in word1 must come before the letter in word2 in the alien language's order.
Given a list of strings words
representing the alien dictionary, return a string representing the order of the letters in the alien language. If there is no solution, return "". If there are multiple valid orderings, any one of them is acceptable.
Example 1
Input: words = ["wrt","wrf","er","ett","rftt"] Output: "wertf"
Example 2
Input: words = ["z","x"] Output: "zx"
Steps
-
Build the graph: Create a directed graph where each node represents a letter from the alien language. An edge from letter 'a' to letter 'b' indicates that 'a' comes before 'b' in the alien order. We build this graph by comparing adjacent words in the input list.
-
Detect cycles: Check if the graph contains cycles. A cycle would indicate a contradiction in the alien dictionary (e.g., a < b and b < a). If a cycle is detected, no valid ordering is possible, and we return "".
-
Topological sort: Perform a topological sort on the graph. Topological sort arranges the nodes in such a way that for every directed edge from node A to node B, node A appears before node B in the ordering. This sorted order represents the alien language's letter order.
Explanation
The core idea is to represent the ordering constraints as a directed graph and then use topological sorting to find a valid ordering. Topological sorting is only possible on Directed Acyclic Graphs (DAGs). If a cycle is detected, it means there's an inconsistency in the provided word order (e.g., "abc" and "acb" would create a cycle).
The graph is built by comparing consecutive words. For instance, comparing "wrt" and "wrf" tells us 't' comes before 'f'. We add an edge from 't' to 'f' in the graph.
Code
from collections import defaultdict
def alienOrder(words):
graph = defaultdict(set)
in_degree = {}
for word in words:
for char in word:
in_degree[char] = 0
for i in range(len(words) - 1):
word1, word2 = words[i], words[i+1]
min_len = min(len(word1), len(word2))
for j in range(min_len):
if word1[j] != word2[j]:
if word2[j] not in graph[word1[j]]:
graph[word1[j]].add(word2[j])
in_degree[word2[j]] += 1
break
else:
if len(word1) > len(word2):
return "" #lexicographical order violation
queue = [char for char, degree in in_degree.items() if degree == 0]
result = ""
while queue:
char = queue.pop(0)
result += char
for neighbor in graph[char]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
if len(result) != len(in_degree):
return "" # Cycle detected
return result
Complexity
-
Time Complexity: O(V + E), where V is the number of unique letters and E is the number of edges in the graph. Building the graph takes O(V + E) and topological sort takes O(V + E).
-
Space Complexity: O(V + E), primarily to store the graph and in-degree information.