|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.util.Observable
org.exist.storage.TextSearchEngine
org.exist.storage.NativeTextEngine
This class is responsible for fulltext-indexing. Text-nodes are handed over
to this class to be fulltext-indexed. Method storeText() is called by
RelationalBroker whenever it finds a TextNode. Method getNodeIDsContaining()
is used by the XPath-engine to process queries where a fulltext-operator is
involved. The class keeps two database tables: table dbTokens
stores the words
found with their unique id. Table invertedIndex
contains the word occurrences for
every word-id per document.
TODO: store node type (attribute or text) with each entry
Field Summary | |
static byte |
ATTRIBUTE_SECTION
|
static int |
MAX_TOKEN_LENGTH
Length limit for the tokens |
static byte |
TEXT_SECTION
|
Fields inherited from class org.exist.storage.TextSearchEngine |
PROPERTY_INDEX_NUMBERS, PROPERTY_STEM, PROPERTY_STORE_TERM_FREQUENCY, PROPERTY_TOKENIZER |
Constructor Summary | |
NativeTextEngine(DBBroker broker,
Configuration config,
BFile db)
|
Method Summary | |
boolean |
close()
|
static boolean |
containsWildcards(java.lang.String str)
Checks if the given string could be a regular expression. |
void |
dropIndex(Collection collection)
Drop all index entries for the given collection. |
void |
dropIndex(DocumentImpl document)
Drop all index entries for the given document. |
void |
endElement(int xpathType,
ElementImpl node,
java.lang.String content)
store and index given element (called storeElement before) |
void |
flush()
writes the pending items, for the current document's collection |
java.lang.String[] |
getIndexTerms(DocumentSet docs,
TermMatcher matcher)
|
NodeSet |
getNodes(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
TermMatcher matcher,
java.lang.CharSequence startTerm)
|
NodeSet |
getNodesContaining(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
java.lang.String expr,
int type,
boolean matchAll)
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. |
NodeSet |
getNodesExact(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
java.lang.String expr)
Get all nodes whose content exactly matches the give expression. |
int |
getTrackMatches()
|
void |
printStatistics()
|
void |
reindex(DocumentImpl document,
StoredNode node)
Reindexes all pending items for the specified document. |
void |
remove()
remove all pending modifications, for the current document. |
void |
removeElement(ElementImpl node,
NodePath currentPath,
java.lang.String content)
Mark given Element for removal; added entries are written to the list of pending entries. |
Occurrences[] |
scanIndexTerms(DocumentSet docs,
NodeSet contextSet,
java.lang.String start,
java.lang.String end)
Queries the fulltext index to retrieve information on indexed words contained in the index for the current collection. |
void |
setDocument(DocumentImpl document)
set the current document; generally called before calling an operation |
void |
setTrackMatches(int flags)
|
void |
startElement(ElementImpl impl,
NodePath currentPath,
boolean index)
corresponds to SAX function of the same name |
static boolean |
startsWithWildcard(java.lang.String str)
|
void |
storeAttribute(AttrImpl node,
NodePath currentPath,
boolean fullTextIndexSwitch)
store and index given attribute |
void |
storeAttribute(FulltextIndexSpec indexSpec,
AttrImpl attr)
Indexes the tokens contained in an attribute. |
void |
storeAttribute(RangeIndexSpec spec,
AttrImpl node)
|
void |
storeText(FulltextIndexSpec indexSpec,
StoredNode parent,
java.lang.String text)
|
void |
storeText(FulltextIndexSpec indexSpec,
TextImpl text,
boolean noTokenizing)
Indexes the tokens contained in a text node. |
void |
storeText(TextImpl node,
NodePath currentPath,
boolean fullTextIndexSwitch)
store and index given text node |
void |
sync()
triggers a cache sync, i.e. |
java.lang.String |
toString()
|
Methods inherited from class org.exist.storage.TextSearchEngine |
getNodesContaining, getTokenizer |
Methods inherited from class java.util.Observable |
addObserver, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final byte TEXT_SECTION
public static final byte ATTRIBUTE_SECTION
public static final int MAX_TOKEN_LENGTH
Constructor Detail |
public NativeTextEngine(DBBroker broker, Configuration config, BFile db)
Method Detail |
public static final boolean containsWildcards(java.lang.String str)
str
- The stringpublic static final boolean startsWithWildcard(java.lang.String str)
public int getTrackMatches()
getTrackMatches
in class TextSearchEngine
public void setTrackMatches(int flags)
setTrackMatches
in class TextSearchEngine
public void setDocument(DocumentImpl document)
ContentLoadingObserver
setDocument
in interface ContentLoadingObserver
public void storeAttribute(FulltextIndexSpec indexSpec, AttrImpl attr)
storeAttribute
in class TextSearchEngine
attr
- The attribute to be indexedindexSpec
- public void storeText(FulltextIndexSpec indexSpec, TextImpl text, boolean noTokenizing)
storeText
in class TextSearchEngine
indexSpec
- The index configurationtext
- The text node to be indexednoTokenizing
- if true
, given text is indexed as a single token
if false
, it is tokenized before being indexedpublic void storeText(FulltextIndexSpec indexSpec, StoredNode parent, java.lang.String text)
storeText
in class TextSearchEngine
public void storeAttribute(RangeIndexSpec spec, AttrImpl node)
public void storeAttribute(AttrImpl node, NodePath currentPath, boolean fullTextIndexSwitch)
ContentLoadingObserver
storeAttribute
in interface ContentLoadingObserver
public void storeText(TextImpl node, NodePath currentPath, boolean fullTextIndexSwitch)
ContentLoadingObserver
storeText
in interface ContentLoadingObserver
public void startElement(ElementImpl impl, NodePath currentPath, boolean index)
ContentLoadingObserver
startElement
in interface ContentLoadingObserver
public void endElement(int xpathType, ElementImpl node, java.lang.String content)
ContentLoadingObserver
endElement
in interface ContentLoadingObserver
public void removeElement(ElementImpl node, NodePath currentPath, java.lang.String content)
ContentLoadingObserver
ContentLoadingObserver.flush()
is called later to flush all pending entries.
removeElement
in interface ContentLoadingObserver
public void sync()
ContentLoadingObserver
sync
in interface ContentLoadingObserver
public void flush()
ContentLoadingObserver
flush
in interface ContentLoadingObserver
flush
in class TextSearchEngine
public void reindex(DocumentImpl document, StoredNode node)
ContentLoadingObserver
reindex
in interface ContentLoadingObserver
reindex
in class TextSearchEngine
document
- node
- public void remove()
ContentLoadingObserver
remove
in interface ContentLoadingObserver
public void dropIndex(Collection collection)
ContentLoadingObserver
dropIndex
in interface ContentLoadingObserver
dropIndex
in class TextSearchEngine
collection
- public void dropIndex(DocumentImpl document)
ContentLoadingObserver
dropIndex
in interface ContentLoadingObserver
dropIndex
in class TextSearchEngine
document
- public NodeSet getNodesContaining(XQueryContext context, DocumentSet docs, NodeSet contextSet, java.lang.String expr, int type, boolean matchAll) throws TerminatedException
TextSearchEngine
getNodesContaining
in class TextSearchEngine
TerminatedException
public NodeSet getNodesExact(XQueryContext context, DocumentSet docs, NodeSet contextSet, java.lang.String expr) throws TerminatedException
TerminatedException
public NodeSet getNodes(XQueryContext context, DocumentSet docs, NodeSet contextSet, TermMatcher matcher, java.lang.CharSequence startTerm) throws TerminatedException
getNodes
in class TextSearchEngine
TerminatedException
public java.lang.String[] getIndexTerms(DocumentSet docs, TermMatcher matcher)
getIndexTerms
in class TextSearchEngine
public Occurrences[] scanIndexTerms(DocumentSet docs, NodeSet contextSet, java.lang.String start, java.lang.String end) throws PermissionDeniedException
TextSearchEngine
Occurrences
for all
words contained in the index. If param end is null, all words starting with
the string sequence param start are returned. Otherwise, the method
returns all words that come after start and before end in lexical order.
scanIndexTerms
in class TextSearchEngine
PermissionDeniedException
public boolean close() throws DBException
close
in class TextSearchEngine
DBException
public void printStatistics()
public java.lang.String toString()
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |