|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.biojava.bio.symbol.UkkonenSuffixTree
A suffix tree is an efficient method for encoding the frequencies
of motifs in a sequence. They are sometimes used to quickly screen
for similar sequences. For instance, all motifs of length up to
2 in the sequence AAGT
could be encoded as:
root(4) | A(2)--------G(1)-----T(1) | | A(1)--G(1) T(1)
It supports addition of elements both as String and SymbolList. They should not be mixed together. The strings are also terminated internally, so it is possible to go and add more than one string to the suffix tree.
Some more work need to be done on how data should be generated from this class. If you need something that's not in there, please e-mail the list at biojava-dev@biojava.org and I'll add it in there. <\p>
Field Summary | |
static char |
DEFAULT_TERM_CHAR
|
static int |
TO_A_LEAF
|
Constructor Summary | |
UkkonenSuffixTree()
Initializes a new UkkonenSuffixTree instance. |
|
UkkonenSuffixTree(FiniteAlphabet alpha)
|
|
UkkonenSuffixTree(java.lang.String seqs)
|
Method Summary | |
void |
addSequence(java.lang.String seq,
java.lang.String name,
boolean doNotTerminate)
Add a sequence into the tree. |
void |
addSymbolList(SymbolList list,
java.lang.String name,
boolean doNotTerminate)
|
protected java.util.ArrayList |
getAllNodes(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode root,
java.util.ArrayList list,
boolean leavesOnly)
|
protected java.lang.String |
getEdgeLabel(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode child)
|
protected int |
getEdgeLength(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode child)
Tree navigation methods |
protected java.lang.String |
getLabel(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode node)
|
protected int |
getPathEnd(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode node)
|
protected int |
getPathLength(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode node)
|
org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode |
getRoot()
|
org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode |
jumpTo(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode starting,
java.lang.String source,
int from,
int to)
Just like walkTo, but faster when used during tree construction, as it assumes that a mismatch can only occurs with the last character of the target string. |
void |
printTree()
|
SymbolList |
stringToSymbolList(java.lang.String string)
Converts a string that came from symbolListToString back to a SymbolList. |
boolean |
subStringExists(java.lang.String str)
|
java.lang.String |
symbolListToString(SymbolList list)
Makes a string out of a SymbolList, this string should only be used for internal or testing purposes, as it will necessarily consist of visible characters. |
org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode |
walkTo(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode starting,
java.lang.String source,
int from,
int to)
This method is used to walk down the tree, from a given node. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final char DEFAULT_TERM_CHAR
public static final int TO_A_LEAF
Constructor Detail |
public UkkonenSuffixTree()
UkkonenSuffixTree
instance.
public UkkonenSuffixTree(java.lang.String seqs)
public UkkonenSuffixTree(FiniteAlphabet alpha)
Method Detail |
public java.lang.String symbolListToString(SymbolList list) throws IllegalSymbolException
list
- a SymbolList
to be converted to a string.
String
representation of the SymbolList
IllegalSymbolException
- if an error occurspublic SymbolList stringToSymbolList(java.lang.String string)
string
- a String
to be converted to SymbolList
SymbolList
representation of the string above.public void addSymbolList(SymbolList list, java.lang.String name, boolean doNotTerminate) throws IllegalSymbolException
IllegalSymbolException
public void addSequence(java.lang.String seq, java.lang.String name, boolean doNotTerminate)
seq
- the sequence/sequences to be added into the tree.doNotTerminate
- whether we should terminate the sequence if it's non-terminated.public org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode walkTo(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode starting, java.lang.String source, int from, int to)
rule
variable can be used to check where the walk
stopped. Note that rule 3 means that the string used to walk down the
tree does not match (which is a bit different from the construction
where rule 3 implies that only the last character doesn't match.
The String is encoded as a substring of a given source. This is done to
avoid replicating the string. To send walk down the string
x
from the root, one would call walkTo(root,x,0,x.length()).
starting
- the root of the subtree we're walking down form.source
- a superstring that contains the string we're using to
walking down. source.subtring(from,to) should give the string we're
walking down from.from
- the start position (inclusive) of the target string in the
source.to
- the end position (exclusive) of the target string in the node.
SuffixNode
that the walk stopped at. If the walk
stopped inside an edge. (check the rule variable to see where it stopped).public org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode jumpTo(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode starting, java.lang.String source, int from, int to)
starting
- the root of the subtree we're walking down form.source
- a superstring that contains the string we're using to
walking down. source.subtring(from,to) should give the string we're
walking down from.from
- the start position (inclusive) of the target string in the
source.to
- the end position (exclusive) of the target string in the node.
SuffixNode
that the walk stopped at. If the walk
stopped inside an edge. (check the rule variable to see where it
stopped).protected int getEdgeLength(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode child)
protected java.lang.String getEdgeLabel(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode child)
protected int getPathLength(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode node)
protected int getPathEnd(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode node)
protected java.lang.String getLabel(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode node)
protected java.util.ArrayList getAllNodes(org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode root, java.util.ArrayList list, boolean leavesOnly)
public void printTree()
public org.biojava.bio.symbol.UkkonenSuffixTree.SuffixNode getRoot()
public boolean subStringExists(java.lang.String str)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |