Top K Frequent Elements in an Array

Introduction

In the world of data analysis and processing, the ability to identify the most frequently occurring elements in a given dataset is a fundamental and widely-applicable problem. This problem, known as "Top K Frequent Elements in an Array," has numerous real-world applications, such as:

  • Recommendation Systems: Identifying the most popular items or products that users frequently interact with, to provide personalized recommendations.
  • Anomaly Detection: Detecting unusual or infrequent occurrences in a dataset, which could indicate potential problems or anomalies.
  • Text Analysis: Determining the most common words or phrases in a text corpus, which can be useful for tasks like sentiment analysis, topic modeling, and natural language processing.
  • Network Analysis: Identifying the most frequently accessed nodes or hubs in a network, which can provide insights into the structure and dynamics of the network.

In this comprehensive blog post, we will explore various approaches to solving the "Top K Frequent Elements in an Array" problem, analyze their time and space complexities, and provide detailed implementations in popular programming languages.

Approaches to Solving the Problem

Approach 1: Using Hash Map and Sorting

The first approach to solving this problem involves using a hash map to store the frequency of each element in the array, and then sorting the elements based on their frequencies.

Steps:

  1. Create a hash map to store the frequency of each element in the array.
  2. Sort the elements in the hash map based on their frequencies, in descending order.
  3. Retrieve the top K most frequent elements from the sorted hash map.

Time Complexity: O(n + d * log d), where n is the size of the input array and d is the number of distinct elements in the array.
Auxiliary Space: O(d), where d is the number of distinct elements in the array.

Here‘s the implementation in Python, Java, C++, and JavaScript:

from collections import Counter

def topKFrequent(arr, k):
    # Count frequency of each element
    freq = Counter(arr)

    # Sort the elements based on their frequencies
    sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)

    # Retrieve the top K most frequent elements
    return [x[0] for x in sorted_freq[:k]]
import java.util.*;

class GfG {
    public static List<Integer> topKFrequent(int[] arr, int k) {
        // Count frequency of each element
        Map<Integer, Integer> freq = new HashMap<>();
        for (int num : arr) {
            freq.put(num, freq.getOrDefault(num, 0) + 1);
        }

        // Sort the elements based on their frequencies
        List<int[]> sortedFreq = new ArrayList<>(freq.entrySet());
        sortedFreq.sort((a, b) -> b[1] - a[1]);

        // Retrieve the top K most frequent elements
        List<Integer> result = new ArrayList<>();
        for (int i = 0; i < k; i++) {
            result.add(sortedFreq.get(i)[0]);
        }
        return result;
    }
}
#include <iostream>
#include <unordered_map>
#include <vector>
#include <algorithm>

using namespace std;

static bool compare(pair<int, int>& p1, pair<int, int>& p2) {
    // Prioritize element‘s value in case their frequency was the same
    if (p1.second == p2.second) {
        return p1.first > p2.first;
    }
    // Sort in descending order of frequencies
    return p1.second > p2.second;
}

vector<int> topKFrequent(vector<int>& arr, int k) {
    // Count frequency of each element
    unordered_map<int, int> freq;
    for (int num : arr) {
        freq[num]++;
    }

    // Store the elements of ‘freq‘ in the vector ‘sortedFreq‘
    vector<pair<int, int>> sortedFreq(freq.begin(), freq.end());
    sort(sortedFreq.begin(), sortedFreq.end(), compare);

    // Retrieve the top K most frequent elements
    vector<int> result;
    for (int i = 0; i < k; i++) {
        result.push_back(sortedFreq[i].first);
    }
    return result;
}
function topKFrequent(arr, k) {
    // Count frequency of each element
    let freq = new Map();
    for (let num of arr) {
        freq.set(num, (freq.get(num) || 0) + 1);
    }

    // Sort the elements based on their frequencies
    let sortedFreq = Array.from(freq.entries()).sort((a, b) => b[1] - a[1]);

    // Retrieve the top K most frequent elements
    let result = [];
    for (let i = 0; i < k; i++) {
        result.push(sortedFreq[i][0]);
    }
    return result;
}

This approach is straightforward and easy to implement, but it has a higher time complexity due to the sorting step. It is suitable for scenarios where the number of distinct elements in the array is relatively small, and the top K most frequent elements need to be retrieved.

Approach 2: Using Hash Map and Max Heap

The second approach involves using a hash map to store the frequency of each element, and then using a max heap (priority queue) to efficiently retrieve the top K most frequent elements.

Steps:

  1. Create a hash map to store the frequency of each element in the array.
  2. Create a max heap (priority queue) to store the frequency-element pairs, with the frequency as the priority.
  3. Iterate through the hash map and add each frequency-element pair to the max heap. If the size of the heap exceeds K, remove the element with the smallest frequency.
  4. Retrieve the top K most frequent elements from the max heap.

Time Complexity: O(n + k * log k), where n is the size of the input array and k is the value of K.
Auxiliary Space: O(d), where d is the number of distinct elements in the array.

Here‘s the implementation in Python, Java, C++, and JavaScript:

import heapq
from collections import Counter

def topKFrequent(arr, k):
    # Count frequency of each element
    freq = Counter(arr)

    # Create a max heap to store the frequency-element pairs
    heap = [(-count, num) for num, count in freq.items()]
    heapq.heapify(heap)

    # Retrieve the top K most frequent elements
    return [heapq.heappop(heap)[1] for _ in range(k)]
import java.util.*;

class GfG {
    static class FrequencyComparator implements Comparator<int[]> {
        public int compare(int[] p1, int[] p2) {
            // Prioritize element‘s value in case their frequency was the same
            if (p1[0] == p2[0]) {
                return Integer.compare(p1[1], p2[1]);
            }
            // Sort in increasing order of frequencies (for min heap behavior)
            return Integer.compare(p1[0], p2[0]);
        }
    }

    public static List<Integer> topKFrequent(int[] arr, int k) {
        // Count frequency of each element
        Map<Integer, Integer> freq = new HashMap<>();
        for (int num : arr) {
            freq.put(num, freq.getOrDefault(num, 0) + 1);
        }

        // Create a min heap to store the frequency-element pairs
        PriorityQueue<int[]> heap = new PriorityQueue<>(new FrequencyComparator());
        for (Map.Entry<Integer, Integer> entry : freq.entrySet()) {
            heap.offer(new int[] { entry.getValue(), entry.getKey() });
            if (heap.size() > k) {
                heap.poll();
            }
        }

        // Retrieve the top K most frequent elements
        List<Integer> result = new ArrayList<>();
        while (!heap.isEmpty()) {
            result.add(heap.poll()[1]);
        }
        Collections.reverse(result);
        return result;
    }
}
#include <iostream>
#include <queue>
#include <unordered_map>
#include <vector>

using namespace std;

vector<int> topKFrequent(vector<int>& arr, int k) {
    // Count frequency of each element
    unordered_map<int, int> freq;
    for (int num : arr) {
        freq[num]++;
    }

    // Create a max heap to store the frequency-element pairs
    priority_queue<pair<int, int>, vector<pair<int, int>>, greater<pair<int, int>>> heap;
    for (auto entry : freq) {
        heap.push({entry.second, entry.first});
        if (heap.size() > k) {
            heap.pop();
        }
    }

    // Retrieve the top K most frequent elements
    vector<int> result;
    for (int i = 0; i < k; i++) {
        result.push_back(heap.top().second);
        heap.pop();
    }
    return result;
}
class MinHeap {
    constructor() {
        this.heap = [];
    }

    // Swap elements
    swap(i, j) {
        [this.heap[i], this.heap[j]] = [this.heap[j], this.heap[i]];
    }

    // Get parent index
    parent(i) {
        return Math.floor((i - 1) / 2);
    }

    // Get left child index
    leftChild(i) {
        return 2 * i + 1;
    }

    // Get right child index
    rightChild(i) {
        return 2 * i + 2;
    }

    // Insert an element into the heap
    push(val) {
        this.heap.push(val);
        this.heapifyUp(this.heap.length - 1);
    }

    // Remove and return the smallest element
    pop() {
        if (this.heap.length === 0) return null;
        if (this.heap.length === 1) return this.heap.pop();
        let top = this.heap[0];
        this.heap[0] = this.heap.pop();
        this.heapifyDown(0);
        return top;
    }

    // Heapify up (used when inserting)
    heapifyUp(index) {
        while (index > 0 && this.compare(this.heap[index], this.heap[this.parent(index)]) < 0) {
            this.swap(index, this.parent(index));
            index = this.parent(index);
        }
    }

    // Heapify down (used when deleting)
    heapifyDown(index) {
        let smallest = index;
        let left = this.leftChild(index);
        let right = this.rightChild(index);
        // Use custom comparison function
        if (left < this.heap.length && this.compare(this.heap[left], this.heap[smallest]) < 0) {
            smallest = left;
        }
        if (right < this.heap.length && this.compare(this.heap[right], this.heap[smallest]) < 0) {
            smallest = right;
        }
        if (smallest !== index) {
            this.swap(index, smallest);
            this.heapifyDown(smallest);
        }
    }

    // Custom comparison function (implements the Java comparator logic)
    compare(p1, p2) {
        if (p1[0] === p2[0]) {
            return p1[1] - p2[1]; // Prioritize larger numbers when frequencies are the same
        }
        return p1[0] - p2[0]; // Sort by increasing frequency
    }

    // Return the top element without removing it
    top() {
        return this.heap.length > 0 ? this.heap[0] : null;
    }

    // Return the size of the heap
    size() {
        return this.heap.length;
    }
}

function topKFrequent(arr, k) {
    // Count frequency of each element
    let freq = new Map();
    for (let num of arr) {
        freq.set(num, (freq.get(num) || 0) + 1);
    }

    // Create a min heap to store the frequency-element pairs
    let heap = new MinHeap();
    for (let [num, count] of freq.entries()) {
        heap.push([count, num]); // Store as [frequency, number]
        if (heap.size() > k) {
            heap.pop();
        }
    }

    // Retrieve the top K most frequent elements
    let result = [];
    while (heap.size() > 0) {
        result.push(heap.pop()[1]);
    }
    result.reverse();
    return result;
}

This approach leverages the efficiency of a max heap (priority queue) to quickly retrieve the top K most frequent elements. It has a better time complexity compared to the first approach, as it only requires a linear scan of the hash map and a logarithmic operation for each element in the heap.

Approach 3: Using Counting Sort

The third approach involves using a counting sort-based technique to solve the problem. This approach takes advantage of the fact that the frequencies of the elements are bounded by the size of the input array.

Steps:

  1. Create a hash map to store the frequency of each element in the array.
  2. Find the maximum frequency among all the elements.
  3. Create a 2D array (buckets) where the index represents the frequency, and each bucket contains a list of elements with that frequency.
  4. Traverse the buckets from the highest frequency to the lowest, and collect the top K most frequent elements.

Time Complexity: O(n + d), where n is the size of the input array and d is the number of distinct elements in the array.
Auxiliary Space: O(n), where n is the size of the input array.

Here‘s the implementation in Python, Java, C++, and JavaScript:


def topKFrequent(arr, k):
    # Count frequency of each element
    freq = {}
    for num in arr:
        freq[num] = freq.get(num, 0) + 1

    # Find the maximum frequency
    max_freq = max(freq.values())

    # Create buckets based on frequencies
    buckets = [[] for _ in range(max_freq + 1)]
    for num, count in freq.items():
        buckets[count].append(num)

    # Collect the top K most frequent elements
    result = []
    for i in range(max_freq, 0, -1):
        buckets[i].sort(reverse=True)
        for num in buckets[i]:
            result.appen

Did you like this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.