Top K Frequent Elements medium

Problem Statement

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Example 1:

Input: nums = [1,1,1,2,2,3], k = 2 Output: [1,2]

Example 2:

Input: nums = [1], k = 1 Output: [1]

Steps

  1. Count Frequencies: Create a dictionary (or collections.Counter) to store the frequency of each element in nums.

  2. Sort by Frequency: Sort the dictionary items (key-value pairs) by their frequency (values) in descending order. We can use Python's sorted() function with a custom key to achieve this.

  3. Extract Top K: Take the first k elements from the sorted list, extracting only the keys (the elements themselves).

Explanation

The problem requires finding the k most frequent elements efficiently. A simple approach would be to sort the entire array, but this is inefficient (O(n log n) time complexity). Instead, we use a hash map (dictionary) to count frequencies in O(n) time. Then, sorting the frequencies takes O(n log n) time in the worst case, but since k is typically much smaller than n, the overall time complexity is dominated by the frequency counting.

Code

import collections

def topKFrequent(nums, k):
    """
    Finds the k most frequent elements in a list.

    Args:
      nums: A list of integers.
      k: The number of most frequent elements to return.

    Returns:
      A list containing the k most frequent elements.
    """
    count = collections.Counter(nums)  #Efficiently counts frequencies
    return [item[0] for item in count.most_common(k)] #most_common is optimized for this


Complexity

  • Time Complexity: O(n + n log n) which simplifies to O(n log n) because the sorting dominates for large n. If k is very small compared to n, it approaches O(n).
  • Space Complexity: O(n) in the worst case, to store the frequency count of all elements. In practice, this will be less if the input has many duplicate numbers.

Alternative using heapq (for larger datasets and better performance when k is small relative to n):

import heapq

def topKFrequent_heap(nums, k):
    """
    Finds the k most frequent elements using a min-heap.  More efficient for large n, small k.
    """
    count = collections.Counter(nums)
    return heapq.nlargest(k, count.keys(), key=count.get)


This topKFrequent_heap version uses heapq.nlargest, which is optimized for finding the largest k elements from a collection. It's particularly beneficial when k is significantly smaller than the total number of unique elements. The time complexity remains O(n log k) which is better than O(n log n) when k << n. Space complexity remains O(n) in the worst case.