Introduction
In the world of data analysis and processing, the ability to identify the most frequently occurring elements in a given dataset is a fundamental and widely-applicable problem. This problem, known as "Top K Frequent Elements in an Array," has numerous real-world applications, such as:
- Recommendation Systems: Identifying the most popular items or products that users frequently interact with, to provide personalized recommendations.
- Anomaly Detection: Detecting unusual or infrequent occurrences in a dataset, which could indicate potential problems or anomalies.
- Text Analysis: Determining the most common words or phrases in a text corpus, which can be useful for tasks like sentiment analysis, topic modeling, and natural language processing.
- Network Analysis: Identifying the most frequently accessed nodes or hubs in a network, which can provide insights into the structure and dynamics of the network.
In this comprehensive blog post, we will explore various approaches to solving the "Top K Frequent Elements in an Array" problem, analyze their time and space complexities, and provide detailed implementations in popular programming languages.
Approaches to Solving the Problem
Approach 1: Using Hash Map and Sorting
The first approach to solving this problem involves using a hash map to store the frequency of each element in the array, and then sorting the elements based on their frequencies.
Steps:
- Create a hash map to store the frequency of each element in the array.
- Sort the elements in the hash map based on their frequencies, in descending order.
- Retrieve the top K most frequent elements from the sorted hash map.
Time Complexity: O(n + d * log d), where n is the size of the input array and d is the number of distinct elements in the array.
Auxiliary Space: O(d), where d is the number of distinct elements in the array.
Here‘s the implementation in Python, Java, C++, and JavaScript:
from collections import Counter
def topKFrequent(arr, k):
# Count frequency of each element
freq = Counter(arr)
# Sort the elements based on their frequencies
sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
# Retrieve the top K most frequent elements
return [x[0] for x in sorted_freq[:k]]import java.util.*;
class GfG {
public static List<Integer> topKFrequent(int[] arr, int k) {
// Count frequency of each element
Map<Integer, Integer> freq = new HashMap<>();
for (int num : arr) {
freq.put(num, freq.getOrDefault(num, 0) + 1);
}
// Sort the elements based on their frequencies
List<int[]> sortedFreq = new ArrayList<>(freq.entrySet());
sortedFreq.sort((a, b) -> b[1] - a[1]);
// Retrieve the top K most frequent elements
List<Integer> result = new ArrayList<>();
for (int i = 0; i < k; i++) {
result.add(sortedFreq.get(i)[0]);
}
return result;
}
}#include <iostream>
#include <unordered_map>
#include <vector>
#include <algorithm>
using namespace std;
static bool compare(pair<int, int>& p1, pair<int, int>& p2) {
// Prioritize element‘s value in case their frequency was the same
if (p1.second == p2.second) {
return p1.first > p2.first;
}
// Sort in descending order of frequencies
return p1.second > p2.second;
}
vector<int> topKFrequent(vector<int>& arr, int k) {
// Count frequency of each element
unordered_map<int, int> freq;
for (int num : arr) {
freq[num]++;
}
// Store the elements of ‘freq‘ in the vector ‘sortedFreq‘
vector<pair<int, int>> sortedFreq(freq.begin(), freq.end());
sort(sortedFreq.begin(), sortedFreq.end(), compare);
// Retrieve the top K most frequent elements
vector<int> result;
for (int i = 0; i < k; i++) {
result.push_back(sortedFreq[i].first);
}
return result;
}function topKFrequent(arr, k) {
// Count frequency of each element
let freq = new Map();
for (let num of arr) {
freq.set(num, (freq.get(num) || 0) + 1);
}
// Sort the elements based on their frequencies
let sortedFreq = Array.from(freq.entries()).sort((a, b) => b[1] - a[1]);
// Retrieve the top K most frequent elements
let result = [];
for (let i = 0; i < k; i++) {
result.push(sortedFreq[i][0]);
}
return result;
}This approach is straightforward and easy to implement, but it has a higher time complexity due to the sorting step. It is suitable for scenarios where the number of distinct elements in the array is relatively small, and the top K most frequent elements need to be retrieved.
Approach 2: Using Hash Map and Max Heap
The second approach involves using a hash map to store the frequency of each element, and then using a max heap (priority queue) to efficiently retrieve the top K most frequent elements.
Steps:
- Create a hash map to store the frequency of each element in the array.
- Create a max heap (priority queue) to store the frequency-element pairs, with the frequency as the priority.
- Iterate through the hash map and add each frequency-element pair to the max heap. If the size of the heap exceeds K, remove the element with the smallest frequency.
- Retrieve the top K most frequent elements from the max heap.
Time Complexity: O(n + k * log k), where n is the size of the input array and k is the value of K.
Auxiliary Space: O(d), where d is the number of distinct elements in the array.
Here‘s the implementation in Python, Java, C++, and JavaScript:
import heapq
from collections import Counter
def topKFrequent(arr, k):
# Count frequency of each element
freq = Counter(arr)
# Create a max heap to store the frequency-element pairs
heap = [(-count, num) for num, count in freq.items()]
heapq.heapify(heap)
# Retrieve the top K most frequent elements
return [heapq.heappop(heap)[1] for _ in range(k)]import java.util.*;
class GfG {
static class FrequencyComparator implements Comparator<int[]> {
public int compare(int[] p1, int[] p2) {
// Prioritize element‘s value in case their frequency was the same
if (p1[0] == p2[0]) {
return Integer.compare(p1[1], p2[1]);
}
// Sort in increasing order of frequencies (for min heap behavior)
return Integer.compare(p1[0], p2[0]);
}
}
public static List<Integer> topKFrequent(int[] arr, int k) {
// Count frequency of each element
Map<Integer, Integer> freq = new HashMap<>();
for (int num : arr) {
freq.put(num, freq.getOrDefault(num, 0) + 1);
}
// Create a min heap to store the frequency-element pairs
PriorityQueue<int[]> heap = new PriorityQueue<>(new FrequencyComparator());
for (Map.Entry<Integer, Integer> entry : freq.entrySet()) {
heap.offer(new int[] { entry.getValue(), entry.getKey() });
if (heap.size() > k) {
heap.poll();
}
}
// Retrieve the top K most frequent elements
List<Integer> result = new ArrayList<>();
while (!heap.isEmpty()) {
result.add(heap.poll()[1]);
}
Collections.reverse(result);
return result;
}
}#include <iostream>
#include <queue>
#include <unordered_map>
#include <vector>
using namespace std;
vector<int> topKFrequent(vector<int>& arr, int k) {
// Count frequency of each element
unordered_map<int, int> freq;
for (int num : arr) {
freq[num]++;
}
// Create a max heap to store the frequency-element pairs
priority_queue<pair<int, int>, vector<pair<int, int>>, greater<pair<int, int>>> heap;
for (auto entry : freq) {
heap.push({entry.second, entry.first});
if (heap.size() > k) {
heap.pop();
}
}
// Retrieve the top K most frequent elements
vector<int> result;
for (int i = 0; i < k; i++) {
result.push_back(heap.top().second);
heap.pop();
}
return result;
}class MinHeap {
constructor() {
this.heap = [];
}
// Swap elements
swap(i, j) {
[this.heap[i], this.heap[j]] = [this.heap[j], this.heap[i]];
}
// Get parent index
parent(i) {
return Math.floor((i - 1) / 2);
}
// Get left child index
leftChild(i) {
return 2 * i + 1;
}
// Get right child index
rightChild(i) {
return 2 * i + 2;
}
// Insert an element into the heap
push(val) {
this.heap.push(val);
this.heapifyUp(this.heap.length - 1);
}
// Remove and return the smallest element
pop() {
if (this.heap.length === 0) return null;
if (this.heap.length === 1) return this.heap.pop();
let top = this.heap[0];
this.heap[0] = this.heap.pop();
this.heapifyDown(0);
return top;
}
// Heapify up (used when inserting)
heapifyUp(index) {
while (index > 0 && this.compare(this.heap[index], this.heap[this.parent(index)]) < 0) {
this.swap(index, this.parent(index));
index = this.parent(index);
}
}
// Heapify down (used when deleting)
heapifyDown(index) {
let smallest = index;
let left = this.leftChild(index);
let right = this.rightChild(index);
// Use custom comparison function
if (left < this.heap.length && this.compare(this.heap[left], this.heap[smallest]) < 0) {
smallest = left;
}
if (right < this.heap.length && this.compare(this.heap[right], this.heap[smallest]) < 0) {
smallest = right;
}
if (smallest !== index) {
this.swap(index, smallest);
this.heapifyDown(smallest);
}
}
// Custom comparison function (implements the Java comparator logic)
compare(p1, p2) {
if (p1[0] === p2[0]) {
return p1[1] - p2[1]; // Prioritize larger numbers when frequencies are the same
}
return p1[0] - p2[0]; // Sort by increasing frequency
}
// Return the top element without removing it
top() {
return this.heap.length > 0 ? this.heap[0] : null;
}
// Return the size of the heap
size() {
return this.heap.length;
}
}
function topKFrequent(arr, k) {
// Count frequency of each element
let freq = new Map();
for (let num of arr) {
freq.set(num, (freq.get(num) || 0) + 1);
}
// Create a min heap to store the frequency-element pairs
let heap = new MinHeap();
for (let [num, count] of freq.entries()) {
heap.push([count, num]); // Store as [frequency, number]
if (heap.size() > k) {
heap.pop();
}
}
// Retrieve the top K most frequent elements
let result = [];
while (heap.size() > 0) {
result.push(heap.pop()[1]);
}
result.reverse();
return result;
}This approach leverages the efficiency of a max heap (priority queue) to quickly retrieve the top K most frequent elements. It has a better time complexity compared to the first approach, as it only requires a linear scan of the hash map and a logarithmic operation for each element in the heap.
Approach 3: Using Counting Sort
The third approach involves using a counting sort-based technique to solve the problem. This approach takes advantage of the fact that the frequencies of the elements are bounded by the size of the input array.
Steps:
- Create a hash map to store the frequency of each element in the array.
- Find the maximum frequency among all the elements.
- Create a 2D array (buckets) where the index represents the frequency, and each bucket contains a list of elements with that frequency.
- Traverse the buckets from the highest frequency to the lowest, and collect the top K most frequent elements.
Time Complexity: O(n + d), where n is the size of the input array and d is the number of distinct elements in the array.
Auxiliary Space: O(n), where n is the size of the input array.
Here‘s the implementation in Python, Java, C++, and JavaScript:
def topKFrequent(arr, k):
# Count frequency of each element
freq = {}
for num in arr:
freq[num] = freq.get(num, 0) + 1
# Find the maximum frequency
max_freq = max(freq.values())
# Create buckets based on frequencies
buckets = [[] for _ in range(max_freq + 1)]
for num, count in freq.items():
buckets[count].append(num)
# Collect the top K most frequent elements
result = []
for i in range(max_freq, 0, -1):
buckets[i].sort(reverse=True)
for num in buckets[i]:
result.appen