analyzers under two packages:
common: to provide a variety of tools commonly used word, such as cjk segmentation, ChineseAnalyzer word, and the Thai word, the Brazilian language word, such as the Dutch language a good word multilingual word
SmartChineseAnalyzer is an intelligent Chinese word segmentation module, able to use the optimal probability of the segmentation of Chinese sentences, and the embedded English tokenizer, can effectively deal with mixed Chinese and English text. Its principle is based on natural language processing hidden Markov model (HMM), the training corpus using a large number of Chinese words to statistical probability of word frequency and jump to the basis of these results to the calculation of the entire Chinese sentences most likelihood (likelihood) segmentation. Need for intelligent word vocabulary dictionary to store the statistical values, SmartChineseAnalyzer run need to specify a dictionary, how the designated location, please refer to org.apache.lucene.analysis.cn.smart.AnalyzerProfile dictionary
Create the index by ant
Ant task to create Lucene indexes.
I can Lucene index stored in BerkeleyDB in What?
Yes, you can save the Lucene index using BerkeleyDB. DbDirectory object can be used.
Berkeley DB Java Edition (JE) is a written entirely in JAVA, it is appropriate to manage the vast amounts of simple data.
Using the Lucene Command Line Interface LUCLI (Lucene Cmmand-Line Interface). LUCLI command-line operation by the index information of third-party tools
memory memory index
regex regular expression search
remote remote search
snowball classic word main support appliances, European languages
spatial location-aware search
spellchecker spelling, this is true, such as users search the wrong, and can give the user a prompt
wordnet synonyms like and the
Uncertainty of the package:
benchmark: if and pressure testing the
collation: do not look know, do not know why to use: the new Unicode support and character set (Collation).
fast-vector-highlighter new fast vector for large text highlighting tools (fast-vector-highlighter)
instantiated: InstantiatedIndex, alternative RAM store for small corpora