Top K Frequent Elements medium
Problem Statement
Given an integer array nums
and an integer k
, return the k most frequent elements. You may return the answer in any order.
Example 1:
Input: nums = [1,1,1,2,2,3], k = 2 Output: [1,2]
Example 2:
Input: nums = [1], k = 1 Output: [1]
Steps
-
Count Frequencies: Create a dictionary (or
collections.Counter
) to store the frequency of each element innums
. -
Sort by Frequency: Sort the dictionary items (key-value pairs) by their frequency (values) in descending order. We can use Python's
sorted()
function with a customkey
to achieve this. -
Extract Top K: Take the first
k
elements from the sorted list, extracting only the keys (the elements themselves).
Explanation
The problem requires finding the k
most frequent elements efficiently. A simple approach would be to sort the entire array, but this is inefficient (O(n log n) time complexity). Instead, we use a hash map (dictionary) to count frequencies in O(n) time. Then, sorting the frequencies takes O(n log n) time in the worst case, but since k
is typically much smaller than n
, the overall time complexity is dominated by the frequency counting.
Code
import collections
def topKFrequent(nums, k):
"""
Finds the k most frequent elements in a list.
Args:
nums: A list of integers.
k: The number of most frequent elements to return.
Returns:
A list containing the k most frequent elements.
"""
count = collections.Counter(nums) #Efficiently counts frequencies
return [item[0] for item in count.most_common(k)] #most_common is optimized for this
Complexity
- Time Complexity: O(n + n log n) which simplifies to O(n log n) because the sorting dominates for large n. If
k
is very small compared ton
, it approaches O(n). - Space Complexity: O(n) in the worst case, to store the frequency count of all elements. In practice, this will be less if the input has many duplicate numbers.
Alternative using heapq (for larger datasets and better performance when k is small relative to n):
import heapq
def topKFrequent_heap(nums, k):
"""
Finds the k most frequent elements using a min-heap. More efficient for large n, small k.
"""
count = collections.Counter(nums)
return heapq.nlargest(k, count.keys(), key=count.get)
This topKFrequent_heap
version uses heapq.nlargest
, which is optimized for finding the largest k elements from a collection. It's particularly beneficial when k
is significantly smaller than the total number of unique elements. The time complexity remains O(n log k) which is better than O(n log n) when k << n. Space complexity remains O(n) in the worst case.