In the last part we talked about the $ O(m^2) $ algorithm for building a suffix tree. This time, we will talk about a $ O(m^3) $ time solution.

Why would one wanted to do that? This is because we will improve the $ O(m^3) $ algorithm all the way to linear time. We are following a similar exposition as in Algorithms on String, Tree and Sequences.

The algorithm is really simple, here it is, this is the only change from SuffixTree1.

If we ignore the outer for loop, this is simply the $ O(m^2) $ algorithm we talked in the last post. The algorithm install the 'suffixes' in the compressed trie in the following order

x

xa

a

xab

ab

b

...

And so on - many of the 'suffix' are actually prefix of a suffix of the whole string, but that does not matter for a suffix tree because a suffix tree include all the prefixes anyway.

The blank lines are intentionally added between outer loop iterations. Following the book's convention, we call an outer loop iteration a phase, and an inner loop iteration an extension. So for a string with length $ m $, we have $ m $ phases, from phase 1 to phase $ m $. For phase $ k $, we have $ k $ extensions, from extension 1 to extension $ k $.

Unlike the book's convention, however, we will ALWAYS use 0 based index for string, this is so convenient, and we will ALWAYS use inclusive, exclusive index for string indexes, for example, [3, 7) for "Hello World" is "lo W", as indicated below:

0123456789A

Hello World

This algorithm is really stupid? Yes - it is, we will move forward in the next post.

Why would one wanted to do that? This is because we will improve the $ O(m^3) $ algorithm all the way to linear time. We are following a similar exposition as in Algorithms on String, Tree and Sequences.

The algorithm is really simple, here it is, this is the only change from SuffixTree1.

void SuffixTree2Builder::BuildSuffixTree(string input, SuffixTree2* suffixTree) { string s = "xabxac"; for (unsigned int end = 1; end <= s.length(); end++) { for (unsigned int start = 0; start < end; start++) { suffixTree->Add(s.substr(start, end - start)); } } }

If we ignore the outer for loop, this is simply the $ O(m^2) $ algorithm we talked in the last post. The algorithm install the 'suffixes' in the compressed trie in the following order

x

xa

a

xab

ab

b

...

And so on - many of the 'suffix' are actually prefix of a suffix of the whole string, but that does not matter for a suffix tree because a suffix tree include all the prefixes anyway.

The blank lines are intentionally added between outer loop iterations. Following the book's convention, we call an outer loop iteration a phase, and an inner loop iteration an extension. So for a string with length $ m $, we have $ m $ phases, from phase 1 to phase $ m $. For phase $ k $, we have $ k $ extensions, from extension 1 to extension $ k $.

Unlike the book's convention, however, we will ALWAYS use 0 based index for string, this is so convenient, and we will ALWAYS use inclusive, exclusive index for string indexes, for example, [3, 7) for "Hello World" is "lo W", as indicated below:

0123456789A

Hello World

This algorithm is really stupid? Yes - it is, we will move forward in the next post.

## No comments :

## Post a Comment