Monday, June 22, 2015

Road to suffix tree (04)

It is relatively easy to start with a $ O(m^2) $ algorithm. It takes $ O(m^2) $ time to generate all the suffixes, and insert them one by one into an empty trie. Since insertion take time proportional to the length of the string, the total time for all the insertion is $ O(m^2) $. That's it.

The advantage of this algorithm is that we can have a relatively quick way to test if our advanced algorithm works, we can compare the trie and therefore check the correctness of the advanced algorithm.

Using our compressed trie code, we can generate a visualization of the suffix tree for 'xabxac' as follow:

Notice we have 6 leaves, that is because we have 6 suffixes. In general, we want to have a leaf for each suffix. Without lost of generality, we can simply insert a end of string character to the original string that never appear in the string itself at the end of the string. Implementation wise, a null character would be very convenient for C/C++. With the special end character, we can guarantee a suffix is never a prefix of some other suffixes, and therefore we will always have each suffix ends up as a leaf.

For a suffix tree, we do not care about the values associated with the suffix, at least for now, so we get rid of the payload in the Add method. Without the payload, the Get method is meaningless, so we will also remove it. That results in the SuffixTree1 implementation.

No comments :

Post a Comment