lucene-contrib description

analyzers under two packages:

common: to provide a variety of tools commonly used word, such as cjk segmentation, ChineseAnalyzer word, and the Thai word, the Brazilian language word, such as the Dutch language a good word multilingual word


SmartChineseAnalyzer is an intelligent Chinese word segmentation module, able to use the optimal probability of the segmentation of Chinese sentences, and the embedded English tokenizer, can effectively deal with mixed Chinese and English text. Its principle is based on natural language processing hidden Markov model (HMM), the training corpus using a large number of Chinese words to statistical probability of word frequency and jump to the basis of these results to the calculation of the entire Chinese sentences most likelihood (likelihood) segmentation. Need for intelligent word vocabulary dictionary to store the statistical values, SmartChineseAnalyzer run need to specify a dictionary, how the designated location, please refer to dictionary



Create the index by ant

Ant task to create Lucene indexes.


I can Lucene index stored in BerkeleyDB in What?

Yes, you can save the Lucene index using BerkeleyDB. DbDirectory object can be used.

Berkeley DB Java Edition (JE) is a written entirely in JAVA, it is appropriate to manage the vast amounts of simple data.

highlighter: Highlight


Using the Lucene Command Line Interface LUCLI (Lucene Cmmand-Line Interface). LUCLI command-line operation by the index information of third-party tools

memory memory index

regex regular expression search

remote remote search

snowball classic word main support appliances, European languages

spatial location-aware search

spellchecker spelling, this is true, such as users search the wrong, and can give the user a prompt

wordnet synonyms like and the


Uncertainty of the package:

benchmark: if and pressure testing the

collation: do not look know, do not know why to use: the new Unicode support and character set (Collation).

fast-vector-highlighter new fast vector for large text highlighting tools (fast-vector-highlighter)

instantiated: InstantiatedIndex, alternative RAM store for small corpora







    analyzers under two packages: common: to provide a variety of tools commonly used word, such as cjk segmentation, ChineseAnalyzer word, and the Thai word, the Brazilian language word, such as the Dutch language a good word multilingual word smartcn S

