Thursday, October 20, 2016

LeetCode OJ - Longest Repeating Character Replacement

Problem:

Please find the problem here.

Analysis:

In this post I would like the share an interesting thought process for this problem.

First of all, finding the best substring sounds very much like the maximum contiguous sum problem, so I decided to follow a similar strategy. If I have the best strings ending at position i, and I can easily extend them to become best strings ending at position i + 1, then we are good.

Note that $ k $ can be as large as $ 10^4 $, therefore we cannot afford $ O(k) $ time, because then that will be $ O(nk) $ time, bad.

Solution:

First thing I did is to focus on a single letter, that make the problem much easier. All I am doing is just trying to longest sequence of 'A'. At any time in the algorithm, we keep track of the longest sequence of 'A' ending at the position being considered.

The key is the update process, if we encounter an 'A', then it is easy, all of the sequences just extend its length by 1. If we encounter anything else, we have to make a replacement in order to maintain a sequence of 'A', so all current sequences (except the one that has already made k replacement) still increase length by 1, but all of them has one more replacement.

We will clarify it with an example, let say, we allow at most 2 replacement, and the string is ABABB.

At the beginning, the strings are

At most 2 replacement: ""
At most 1 replacement: ""
At most 0 replacement: ""

They are all simply empty string because we do not allow any character yet. Move one step ahead, the strings becomes:

At most 2 replacement: "A"
At most 1 replacement: "A"
At most 0 replacement: "A"

This is just obvious, let's move one more step ahead, and now interesting thing happens.

At most 2 replacement: "AA"
At most 1 replacement: "AA"
At most 0 replacement: ""

Also worth explaining is the coloring theme. The "at most 1 replacement" string is done by extending the "at most 0 replacement string" by doing 1 replacement at the end, and that why that matches in color. [Of course, the last blue string doesn't come from the earlier blue string, it is just that there is no way we can have any string ends here without any replacement, so we begin again with empty.

At this point, we see an 'A' again, so it is simple

At most 2 replacement: "AAA"
At most 1 replacement: "AAA"
At most 0 replacement: "A"

Not interesting, move one more step, we get

At most 2 replacement: "AAAA"
At most 1 replacement: "AA"
At most 0 replacement: "A"

Last but not least, we move the last step, and finally get

At most 2 replacement: "AAA"
At most 1 replacement: "AA"
At most 0 replacement: "A"

So the longest string we ever seen in this example is 4, so that's the answer for 'A', if we try all alphabets, then we will get the final answer (which is simply replacing the 2 'A' by 'B' and get 5)

Coding wise, we need to be smart to be fast. Three tricks are used for that.

  1. Keep length instead of the strings
  2. Do not increment all numbers (when we see an 'A'), just update the global offset.
  3. Do not move the lengths, consider the array as a circular buffer and move base pointers.
These three tricks combined make the algorithm do just $ O(1) $ work on every letter, that gives us an $ O(n) $ algorithm!


Code:

#include "stdafx.h"

// https://leetcode.com/problems/longest-repeating-character-replacement/

#include "LEET_LONGEST_REPEATING_CHARACTER_REPLACEMENT.h"
#include <map>
#include <iostream>
#include <sstream>
#include <vector>
#include <string>

using namespace std;

namespace _LEET_LONGEST_REPEATING_CHARACTER_REPLACEMENT
{
    class Solution
    {
    public:
        int characterReplacement(string s, int k)
        {
            int best = 0;
            int* bestSoFar = new int[k + 1];
            for (char h = 'A'; h <= 'Z'; h++)
            {
                for (int i = 0; i < k + 1; i++)
                {
                    bestSoFar[i] = 0;
                }
                int base = 0;
                int top = k;
                int offset = 0;
                for (size_t i = 0; i < s.length(); i++)
                {
                    if (s[i] == h)
                    {
                        offset++;
                    }
                    else
                    {
                        offset++;
                        base = base - 1;
                        if (base < 0)
                        {
                            base = k;
                        }
                        bestSoFar[base] = -offset;
                        top--;
                        if (top < 0)
                        {
                            top = k;
                        }
                    }
                    if (bestSoFar[top] + offset > best)
                    {
                        best = bestSoFar[top] + offset;
                    }
                }
            }
            delete[] bestSoFar;
            return best;
        }
    };
};

using namespace _LEET_LONGEST_REPEATING_CHARACTER_REPLACEMENT;

int LEET_LONGEST_REPEATING_CHARACTER_REPLACEMENT()
{
    Solution solution;
    cout << solution.characterReplacement("ABAB", 2) << endl;
    cout << solution.characterReplacement("AABABBA", 1) << endl;
    return 0;
}

No comments :

Post a Comment