It’s a long time that I spend to comprehend building suffix tree by ukkonen’s algorithm. Here I summarize some major point of this algorithm.
1. Suffix link. A internal node(t1, t2, t3,…,tn) has a suffix link. It points to another internal node or root which it’s a node for t2, t3, …, tn
2. Phase. When building a suffix tree, variable i loop from 0 to str.length()-1. For each different i, it is called a phase.
3. Length=0. At phase i, active length is 0, check if there is a pat charAt(i) at root.
4. Internal node not at root suffix link. At phase i, when a internal node is created and active point is not root. At the beginning when a internal A created, internal node points to root. When there is another internal node B created after A in the same phase, A points to the B by suffix link. And it goes on and on.
5. Internal node at root. At phase i, when a internal node is created, acitve point is root and active length is greater than 0. activeLength–, activeEdge++.
6. Update active point at root. Every time when we updated active point and when active point is root, we should update active point. There is chance that new active point change to other internal node after update.
In below example, after internal node M is created, do activeEdge++, activeLength–. We got (activePoint=root, activeEdge=2, activeLength=3). Since it is root, we should update active point. We move 3 chars from position 2(which is s). We move through ssi from root, then we arrived (activePoint=L, activeEdge=3, activeLength=2)
Below is my summary how we split to a internal node in 3 different scenario.
A full structure for mississipi without suffix link:
Here is my code for the suffix tree for below functions: link
1. substring position
2. longest common substring by generalized suffix tree
3. longest palindrome substring
4. longest repeated substring
This is a great video for suffix tree from Tushar: link
Another website to print suffix tree, link We can use this to verify the correctness of our code.