Monday, June 22, 2015

Road to suffix tree (03)

In the last part, we explained compressed trie. In this part we will start talking about suffix tree. A suffix tree is simple a compressed trie of all the suffix of a string.

Consider the string 'xabxac', the suffixes are


So the final suffix tree look like this


So that is the suffix tree, the amazing thing about suffix tree is that it can be build in $ O(m) $ time, where m is the length of the string, while we have $ O(m^2) $ characters in total in all the suffixes.

The linear time algorithm is a complicated beast, so we will NOT attempt to explain it at once, but instead explain it by doing step-wise refinement of a rather trivial algorithm.

Before we goes to the details of how to construct a suffix tree, let start with seeing how to use a suffix tree! What so interesting about having a compressed trie with all the suffixes?

Consider the problem of text searching - suppose our task is to determine whether or not 'ab' is a sub-string of 'xabxac'. With a suffix tree, we can simply walk the tree to find that out, because a sub-string must be a prefix of a suffix!

Being able to to sub-string search in a string in time complexity proportional to the length of the search pattern is simply amazing. Without the suffix tree, one must at least scan the length of the text. For huge text (e.g. DNA), that would dominate the running time! Having a suffix tree on hand is extremely handy, and it takes only linear time to construct - which most algorithm have to spend (asymptotically) anyway.

No comments :

Post a Comment