3167. Better Compression of String - Detailed Explanation

Problem Statement

Given a string, the goal is to compress it into the shortest possible encoded string by replacing repeated substrings with an encoding of the form k[encoded_string] where k is the number of repetitions and encoded_string is the (possibly further compressed) representation of the repeated substring. The encoding should only be used if it makes the overall string shorter. For example, if a substring repeats consecutively, you can replace it with the count followed by the substring in square brackets.

Example Inputs, Outputs, and Explanations

Example 1

Input:

s = "ababab"

Output:

"3[ab]"

Explanation:

  • The substring "ab" is repeated 3 times.
  • The encoded form "3[ab]" has length 5, which is shorter than the original length of 6.

Example 2

Input:

s = "abcde"

Output:

"abcde"

Explanation:

  • No substring repeats in a way that would shorten the string when encoded.
  • The optimal solution is to leave the string as-is.

Example 3

Input:

s = "aabaabaabaab"

Output:

"3[aabaab]"

Explanation:

  • The substring "aabaab" is repeated 3 times consecutively.
  • The encoded form "3[aabaab]" is shorter than the original string.

Hints for the Approach

  1. Dynamic Programming on Substrings:
    Think about breaking the problem into smaller subproblems by finding the optimal compression for every possible substring of the input string. Then, use these results to build up the solution for the full string.

  2. Consider Splitting and Repeating Patterns:
    For any substring, you can either keep it as it is or try splitting it into two parts, combining the compressed results of each part. Also, check if the entire substring can be expressed as one or more repetitions of a smaller substring.

  3. Encoding Only When Beneficial:
    When you detect that a substring is made up of repeated patterns, encode it as k[pattern] only if the resulting string is shorter than the original substring.

Detailed Explanation of the Dynamic Programming Approach

Overview

The key idea is to use dynamic programming (DP) to compute the shortest compressed string for every substring of the input. Let dp[i][j] represent the optimal (shortest) encoded string for the substring s[i:j+1]. We build up these solutions for increasing lengths of substrings and ultimately obtain the result for the entire string (i.e. dp[0][n-1]).

Step-by-Step Walkthrough

  1. Initialization:
    For each index i, set dp[i][i] = s[i] since a single character cannot be compressed further.

  2. DP Over Increasing Lengths:
    For substring lengths from 2 up to the full length of the string:

    • For each starting index i, let j = i + length - 1 be the ending index.
    • Initialize dp[i][j] as the substring s[i:j+1] (i.e. the uncompressed version).
  3. Try All Splits:
    For the current substring s[i:j+1], try splitting it into two parts at every possible position k where i ≤ k < j. Update dp[i][j] with the concatenation of the best compression of the left part and the best compression of the right part:

    dp[i][j] = min(dp[i][j], dp[i][k] + dp[k+1][j])
    

    where “min” is with respect to string length (and lexicographical order if lengths are equal).

  4. Check for Repeated Patterns:
    For the current substring s[i:j+1], check if it can be formed by repeating a smaller substring:

    • For every possible pattern length from 1 to (j-i+1) / 2, check if the length of the current substring is a multiple of the pattern length.
    • Let pattern = s[i:i+pattern_length]. If the entire substring s[i:j+1] equals the pattern repeated some number of times, then form an encoded candidate as:
      candidate = number + "[" + dp[i][i+pattern_length-1] + "]"
      
      where number = (j-i+1) / pattern_length.
    • Update dp[i][j] if the candidate is shorter than the current best.
  5. Final Answer:
    After processing all substrings, the answer is contained in dp[0][n-1].

Complexity Analysis

  • Time Complexity:
    The solution involves iterating over all substrings (O(n²)) and for each substring, trying every possible split (O(n)) and checking for repeating patterns (which in the worst case is another O(n)). Overall, the worst-case time complexity is O(n³).

  • Space Complexity:
    We maintain a DP table of size O(n²) where n is the length of the string.

Python Code

Python3
Python3

. . . .

Java Code

Java
Java

. . . .

Common Mistakes and Edge Cases

  • Not Checking All Splits:
    Failing to try every possible partition of a substring can lead to missing a better compressed result.

  • Incorrect Repetition Check:
    It is crucial to verify that a substring is an exact repetition of a smaller pattern. Overlooking edge cases (like when the pattern itself could be compressed) might result in suboptimal encoding.

  • Encoding When Not Beneficial:
    Only encode a repeated pattern if the encoded form is strictly shorter than the original substring. Otherwise, it might be better to leave the substring unencoded.

TAGS
leetcode
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
Related Courses
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.
;