692. Top K Frequent Words - Detailed Explanation

Problem Statement

Given an array of strings words and an integer k, return the k most frequent words. The answer should be sorted by frequency from highest to lowest. If two words have the same frequency, they should be sorted in lexicographical (alphabetical) order.

For example, if multiple words appear the same number of times, the word with the lower alphabetical order should come first in the result.

Example Inputs and Outputs

Example 1

Input: words = ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation:
The word "i" appears twice and the word "love" also appears twice.
"leetcode" and "coding" appear once.
Among the words that appear twice, "i" comes before "love" lexicographically, so the top 2 frequent words are ["i", "love"].

Example 2

Input: words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation:
"the" appears 4 times, "is" appears 3 times, "sunny" appears 2 times, and "day" appears once.
Sorted by frequency and then lexicographically for words with the same frequency, the top 4 words are ["the", "is", "sunny", "day"].

Hints

Frequency Counting:
Use a hash map (or dictionary) to count how many times each word appears in the array.
Sorting or Priority Queue:
- Sorting Approach: After counting frequencies, sort the words based on two keys: frequency in descending order and lexicographical order for words with equal frequency.
- Heap Approach: Alternatively, use a heap (priority queue) to efficiently extract the top k frequent words, ensuring that you maintain the correct ordering criteria.
Custom Comparator:
Make sure your comparator (or sorting key) is designed so that higher frequency words come first, and for equal frequencies, the word that comes earlier alphabetically appears first.

Approach 1: Hash Map and Sorting

Explanation

Count Frequencies:
Traverse the list of words and use a hash map to count the occurrences of each word.
Sort the Unique Words:
Convert the hash map into a list of entries and sort it using a custom comparator:
- Primary key: Frequency in descending order.
- Secondary key: Lexicographical order in ascending order (for words with the same frequency).
Select the Top k Words:
After sorting, take the first k elements from the sorted list.

Step-by-Step Walkthrough

For input ["i", "love", "leetcode", "i", "love", "coding"]:
1. Count frequencies:
  - "i" → 2, "love" → 2, "leetcode" → 1, "coding" → 1.
2. Sort words based on frequency (desc) and lexicographical order (asc) for ties:
  - Sorted order: ["i", "love", "coding", "leetcode"] (here "coding" and "leetcode" are ordered alphabetically).
3. The top k (k = 2) words are: ["i", "love"].

Approach 2: Hash Map with Heap (Priority Queue)

Explanation

Count Frequencies:
Use a hash map to count the frequency of each word.
Use a Min-Heap:
Create a priority queue (min-heap) that stores the words based on:
- Frequency in ascending order (so that the smallest frequency is at the top).
- For words with equal frequency, use reverse lexicographical order so that the word which should come later lexicographically is at the top.
The heap size is maintained at k. For each word in the frequency map:
- Add the word to the heap.
- If the heap size exceeds k, remove the top element.
Extract and Reverse:
After processing, the heap contains the top k frequent words in reverse order. Extract them and reverse the result to achieve the correct order (highest frequency first, then lexicographical order).

Complexity Analysis

Time Complexity:
- Frequency counting: O(n), where n is the number of words.
- Sorting approach: O(m log m) where m is the number of unique words.
- Heap approach: O(m log k) for m unique words.
Space Complexity:
- O(m) for storing frequencies.
- O(m) in the worst-case for sorting or O(k) for the heap.

Python Code (Sorting Approach)

Python3

. . . .

Java Code (Heap Approach)

Java

. . . .

Common Mistakes and Edge Cases

Handling Ties:
Ensure that when two words have the same frequency, the word that comes first lexicographically is placed before the other.
Empty Input:
Consider the case where the input array is empty. The solution should handle it gracefully.
Large Input Size:
For very large inputs, using a heap may be more efficient than sorting all unique words if k is significantly smaller than the number of unique words.

Alternative Variations

Sorting Approach vs. Heap Approach:
Both methods achieve the desired result. The sorting approach is simple and effective when the number of unique words is small. The heap approach is more efficient when k is much smaller than the total number of unique words.
Using Bucket Sort:
An alternative approach is to use bucket sort since frequencies are bounded by the total number of words. Create buckets where each bucket at index i contains a list of words that appear i times, then iterate over the buckets in reverse order to construct the result.