Grokking 75: Top Coding Interview Questions
Ask Author
Back to course home

0% completed

Vote For New Content
Solution: Sort Characters By Frequency
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Problem Statement

Given a string, arrange its characters in descending order based on the frequency of each character. If two characters have the same frequency, their relative order in the output string can be arbitrary.

Example

  1. Input: s = "trersesess"
    • Expected Output: "sssseeerrt"
    • Justification: The character s appears four times, e three times, r two times and t only once. All characters are sorted based on their frequnecy.
  2. Input: s = "banana"
    • Expected Output: "aaannb".
    • Justification: The character 'a' appears three times, 'n' twice and 'b' once.
  3. Input: s = "abb"
    • Expected Output: "bba"
    • Justification: The character b appears twice and a` only once.

Constraints:

  • 1 <= s.length <= 5 * 10<sup>5</sup>
  • s consists of uppercase and lowercase English letters and digits.

Solution

The solution involves three key steps. First, count the frequency of each character in the string. This can be done using a hashmap where keys are characters and values are their respective counts. Second, build a max heap (priority queue) where each element is a character, and the heap is sorted based on the frequency of characters. The character with the highest frequency will be at the top. Lastly, construct the result string by repeatedly removing the top element of the heap (the most frequent character) and appending it to the result string until the heap is empty. This process ensures that characters are added to the result string in descending order of their frequencies.

Step-by-Step Algorithm

  1. Building a Frequency Map: Begin by iterating over the characters in the string. As you traverse, construct a frequency map (or dictionary) that associates each character with its occurrence count.

  2. Constructing a Max Heap: Next, utilize the frequency map to build a max heap. Each item in this heap will be a character-frequency pair, but the heap will be organized based on the frequencies. This ensures that characters with higher frequencies will reside at the top of the heap.

  3. Building the Result String: Start an iterative process where you repeatedly pop the heap to get the character with the highest current frequency. Each time you pop, append the character to the result string as many times as its frequency indicates. This ensures that characters with higher frequencies are placed before those with lower frequencies in the result.

  4. Completion: Once the heap is exhausted, your result string is complete, containing characters sorted by their frequencies in descending order. This string is then returned as the output.

The utilization of a max heap is crucial here, as it allows efficient retrieval of the character with the highest frequency at any given point in the process, ensuring the desired order in the result string.

Algorithm Walkthrough

Let's walk through the code step by step for the input string s = "trersesess".

1. Build the Frequency Map

  • Initialize an empty frequencyMap to store the frequency of each character.
  • Iterate over each character in the string s and update the frequencyMap:
    • 't' → frequencyMap['t'] = 1
    • 'r' → frequencyMap['r'] = 1
    • 'e' → frequencyMap['e'] = 1
    • 'r' → frequencyMap['r'] = 2
    • 's' → frequencyMap['s'] = 1
    • 'e' → frequencyMap['e'] = 2
    • 's' → frequencyMap['s'] = 2
    • 'e' → frequencyMap['e'] = 3
    • 's' → frequencyMap['s'] = 3
    • 's' → frequencyMap['s'] = 4

Final frequencyMap:

  • {'t': 1, 'r': 2, 'e': 3, 's': 4}

2. Add Entries to the Max Heap

  • Create a max heap (maxHeap) to store the characters based on their frequency in descending order.
  • Add all entries from the frequencyMap to the maxHeap.
  • The heap will organize elements based on their frequency:
    • Insert ('t', 1) → Heap: [('t', 1)]
    • Insert ('r', 2) → Heap: [('r', 2), ('t', 1)]
    • Insert ('e', 3) → Heap: [('e', 3), ('t', 1), ('r', 2)]
    • Insert ('s', 4) → Heap: [('s', 4), ('e', 3), ('r', 2), ('t', 1)]

Heap organization ensures that the character with the highest frequency is always at the top.

3. Build the Result String

  • Initialize an empty StringBuilder named result to construct the final output string.
  • Repeatedly remove the top element from the maxHeap and append it to result as many times as its frequency:
    • Poll ('s', 4) → Append "ssss" to resultresult = "ssss"
    • Poll ('e', 3) → Append "eee" to resultresult = "sssseee"
    • Poll ('r', 2) → Append "rr" to resultresult = "sssseeerr"
    • Poll ('t', 1) → Append "t" to resultresult = "sssseeerrt"

4. Return the Result

  • Convert the result StringBuilder to a string and return it.

Final output for the input string "trersesess" is "sssseeerrt". The characters appear in descending order of their frequency.

Code

Here is the code for this algorithm:

Python3
Python3

. . . .

Time Complexity:

  1. Counting the Frequency of Characters: We traverse the string of length n once to count the frequency of each character. This process has a time complexity of O(n).

  2. Inserting into a Heap/Min-Priority Queue: For every unique character in the input string, we insert it into a priority queue based on its frequency. The insertion of a single element into a priority queue takes O(log m) time where m is the number of unique characters in the string. In the worst case, m can be the size of the character set, but it's much smaller than n in most practical scenarios. The overall time for this step is O(m log m).

  3. Building the Result String: While the heap is not empty, we pop the most frequent character and append it to our result string. Popping from a heap takes O(log m) time, and we do it m times. Appending characters to the result string can be done in (1) time. Therefore, this step overall is O(m log m).

Combining all steps, the dominant factor for time complexity is the process of pushing and popping from the priority queue. Therefore, the overall time complexity is O(n + m log m). Since m is often much smaller than n (especially for large strings), the time complexity can be approximated as O(n) in many cases, but it's more accurate to represent it as O(n + m log m).

Space Complexity:

  1. Storing Frequencies: We use a hashmap or an equivalent data structure to store the frequency of each character. In the worst case, this will take up space proportional to the number of unique characters, m. This gives us a space complexity of O(m).

  2. Priority Queue: We also use a priority queue to store unique characters based on their frequency. In the worst case, this queue can have all unique characters, adding another O(m) space.

  3. Result String: We create a new string to store our result. In the worst case, this string will be of length n, the length of the original string, giving us a space complexity of O(n).

Adding up all the components, the total space complexity is O(n + 2m). However, for the sake of big O notation, this simplifies to O(n + m).

To summarize:

  • Time Complexity: O(n + m log m)
  • Space Complexity: O(n + m)

An Alternate Approach (Using the Bucket Sort)

In the first approach, we solved the problem using a heap, resulting in a time complexity of O(n + m \log m), where n is the length of the string and m is the number of unique characters. While this approach is efficient, it involves the overhead of maintaining a heap, which can be optimized further.

We can use the bucket sort algorithm to solve the problem in linear time complexity. To use bucket sort, we first count the frequency of each character in the string. Then, we create an array of buckets where each bucket index corresponds to a specific frequency. Characters are placed into the appropriate bucket based on their frequency. Finally, we build the output string by iterating through the buckets in reverse order, ensuring that characters with higher frequencies appear first in the result.

This bucket sort approach is more efficient than the heap-based method, reducing the time complexity to O(n) compared to the heap's O(n + m \log m).

Step-by-Step Algorithm

  1. Count the Frequency of Each Character:

    • Create a frequencyMap (a dictionary) to store the frequency of each character in the string.
    • Iterate over each character in the string s. For each character:
      • If the character is already in the frequencyMap, increment its count.
      • If the character is not in the frequencyMap, add it with an initial count of 1.
    • This step ensures that we know how many times each character appears in the string.
  2. Determine the Maximum Frequency:

    • Initialize a variable maxFrequency to zero.
    • Iterate over the values in frequencyMap (i.e., the frequencies of the characters):
      • Update maxFrequency to be the maximum value found during the iteration.
    • This step helps determine the size of the buckets needed in the next step.
  3. Create Buckets Based on Frequency:

    • Create a list buckets, where each index represents a possible frequency (from 0 to maxFrequency).
    • Each element in buckets is an empty list (or array) that will store characters having the corresponding frequency.
    • This step sets up the structure to sort characters based on their frequency.
  4. Place Characters in the Corresponding Buckets:

    • Iterate over the frequencyMap. For each character and its frequency:
      • Append the character to the bucket at the index corresponding to its frequency in buckets.
    • This step groups characters by their frequency, facilitating the sorting process.
  5. Build the Output String:

    • Initialize an empty string builder result to construct the final output string.
    • Iterate over the buckets list in reverse order (from the highest frequency to the lowest):
      • For each bucket (which contains characters with the same frequency):
        • For each character in the bucket:
          • Append the character to result the number of times equal to its frequency (index of the bucket).
    • This step ensures that characters with higher frequencies are placed first in the final string.
  6. Return the Result:

    • Convert the string builder result to a string and return it as the final output.

Algorithm Walkthrough

Let's go through each step of the algorithm with the string s = "trersesess".

  1. Check for Empty Input:

    • The input string s is "trersesess", which is not empty, so we proceed to the next step.
  2. Count the Frequency of Each Character:

    • Initialize frequencyMap as an empty map.
    • Iterate through the string and update frequencyMap:
      • 't' → frequencyMap['t'] = 1
      • 'r' → frequencyMap['r'] = 1
      • 'e' → frequencyMap['e'] = 1
      • 'r' → frequencyMap['r'] = 2
      • 's' → frequencyMap['s'] = 1
      • 'e' → frequencyMap['e'] = 2
      • 's' → frequencyMap['s'] = 2
      • 'e' → frequencyMap['e'] = 3
      • 's' → frequencyMap['s'] = 3
      • 's' → frequencyMap['s'] = 4
    • The final frequencyMap is {'t': 1, 'r': 2, 'e': 3, 's': 4}.
  3. Determine the Maximum Frequency:

    • Initialize maxFrequency = 0.
    • Iterate through the frequencies in frequencyMapto get the maximum frequency.
    • The maximum frequency maxFrequency is 4.
  4. Create Buckets Based on Frequency:

    • Create a list buckets with 5 elements (0 to maxFrequency), each initialized as an empty list.
    • buckets = [[], [], [], [], []]
  5. Place Characters in the Corresponding Buckets:

    • Iterate over frequencyMap:
      • 't' with frequency 1 → Place 't' in buckets[1].
      • 'r' with frequency 2 → Place 'r' in buckets[2].
      • 'e' with frequency 3 → Place 'e' in buckets[3].
      • 's' with frequency 4 → Place 's' in buckets[4].
    • The final buckets list is [[], ['t'], ['r'], ['e'], ['s']].
  6. Build the Output String:

    • Initialize an empty string builder result.
    • Iterate over buckets in reverse order (from 4 to 1):
      • For buckets[4] → Contains 's', append 'ssss' to result.
      • For buckets[3] → Contains 'e', append 'eee' to result.
      • For buckets[2] → Contains 'r', append 'rr' to result.
      • For buckets[1] → Contains 't', append 't' to result.
    • The final result is 'sssseeerrt'.
  7. Return the Result:

    • Convert result to a string and return it: "sssseeerrt".

The final output for the input string "trersesess" is "sssseeerrt".

Code

Here is the code for this algorithm:

Python3
Python3

. . . .

Complexity Analysis

Time Complexity

  • Counting Character Frequencies: This step involves iterating through the input string, which takes O(n) time, where n is the length of the string.

  • Creating and Filling Buckets: The frequency map is processed to create and fill the buckets. This involves iterating through the map and adding elements to the buckets, taking O(n) time.

  • Building the Output String: The final string is built by iterating through the buckets and appending characters based on their frequency, which also takes O(n) time.

Overall Time Complexity: The overall time complexity is O(n).

Space Complexity

  • Frequency Map: The space required for the frequency map is O(m), where m is the number of unique characters in the string.

  • Buckets: The space required for the buckets is O(n) in the worst case when the string contains only a single character.

Overall Space Complexity: The overall space complexity is O(n).

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible