## Wednesday, June 24, 2015

### Road to Suffix Tree (05)

In the last part we talked about the $O(m^2)$ algorithm for building a suffix tree. This time, we will talk about a $O(m^3)$ time solution.

Why would one wanted to do that? This is because we will improve the $O(m^3)$ algorithm all the way to linear time. We are following a similar exposition as in Algorithms on String, Tree and Sequences.

The algorithm is really simple, here it is, this is the only change from SuffixTree1.

void SuffixTree2Builder::BuildSuffixTree(string input, SuffixTree2* suffixTree)
{
string s = "xabxac";
for (unsigned int end = 1; end <= s.length(); end++)
{
for (unsigned int start = 0; start < end; start++)
{
}
}
}

If we ignore the outer for loop, this is simply the $O(m^2)$ algorithm we talked in the last post. The algorithm install the 'suffixes' in the compressed trie in the following order

x

xa
a

xab
ab
b

...

And so on - many of the 'suffix' are actually prefix of a suffix of the whole string, but that does not matter for a suffix tree because a suffix tree include all the prefixes anyway.

The blank lines are intentionally added between outer loop iterations. Following the book's convention, we call an outer loop iteration a phase, and an inner loop iteration an extension. So for a string with length $m$, we have $m$ phases, from phase 1 to phase $m$. For phase $k$, we have $k$ extensions, from extension 1 to extension $k$.

Unlike the book's convention, however, we will ALWAYS use 0 based index for string, this is so convenient, and we will ALWAYS use inclusive, exclusive index for string indexes, for example, [3, 7) for "Hello World" is "lo W", as indicated below:

0123456789A
Hello World

This algorithm is really stupid? Yes - it is, we will move forward in the next post.