lucene-contrib description

2010-06-09  来源:本站原创  分类:Java  人气:181 

analyzers under two packages:

common: to provide a variety of tools commonly used word, such as cjk segmentation, ChineseAnalyzer word, and the Thai word, the Brazilian language word, such as the Dutch language a good word multilingual word


SmartChineseAnalyzer is an intelligent Chinese word segmentation module, able to use the optimal probability of the segmentation of Chinese sentences, and the embedded English tokenizer, can effectively deal with mixed Chinese and English text. Its principle is based on natural language processing hidden Markov model (HMM), the training corpus using a large number of Chinese words to statistical probability of word frequency and jump to the basis of these results to the calculation of the entire Chinese sentences most likelihood (likelihood) segmentation. Need for intelligent word vocabulary dictionary to store the statistical values, SmartChineseAnalyzer run need to specify a dictionary, how the designated location, please refer to dictionary



Create the index by ant

Ant task to create Lucene indexes.


I can Lucene index stored in BerkeleyDB in What?

Yes, you can save the Lucene index using BerkeleyDB. DbDirectory object can be used.

Berkeley DB Java Edition (JE) is a written entirely in JAVA, it is appropriate to manage the vast amounts of simple data.

highlighter: Highlight


Using the Lucene Command Line Interface LUCLI (Lucene Cmmand-Line Interface). LUCLI command-line operation by the index information of third-party tools

memory memory index

regex regular expression search

remote remote search

snowball classic word main support appliances, European languages

spatial location-aware search

spellchecker spelling, this is true, such as users search the wrong, and can give the user a prompt

wordnet synonyms like and the


Uncertainty of the package:

benchmark: if and pressure testing the

collation: do not look know, do not know why to use: the new Unicode support and character set (Collation).

fast-vector-highlighter new fast vector for large text highlighting tools (fast-vector-highlighter)

instantiated: InstantiatedIndex, alternative RAM store for small corpora







  • lucene contrib package description under 2010-09-10

    analyzers under two packages: common: to provide a variety of tools commonly used word, such as cjk segmentation, ChineseAnalyzer word, and the Thai word, the Brazilian language word, such as the Dutch language a good word multilingual word smartcn S

  • Apache Lucene and Solr to use location-aware search 2011-04-25 # The concept of geo-spatial search space in the building search applications, the most important is to identify the need to add to the application of spatial data. These data are usually in some f

  • Based LUCENE realize their recommendation engine (rpm) 2010-10-01

    1, the common recommendation engine algorithmic problems 1), relatively mature, complete, ready-made solution for smaller open source Rough minutes, the current recommendation engine and data mining, and related open source projects are the following

  • LUCENE based recommendation engine to achieve their 2010-11-30

    Based data mining algorithms to implement the recommendation engine is the major e-commerce sites, SNS community is most commonly used method, commonly used in recommendation engine algorithm and Content-Based Collaborative Filtering Recommendation A

  • Overview of major version changes lucene 2011-09-27

    3.4 This release includes a number of bug fixes, optimization and improvement. The main improvements are as follows: Fixed an important bug (LUCENE-3418): the operating system or computer crashes, power outages or when the Lucene index files are vuln

  • 2000个软件开发领域的高频特殊词及精选例句(三) 2015-03-20

    superword是一个Java实现的英文单词分析软件,主要研究英语单词音近形似转化规律.前缀后缀规律.词之间的相似性规律等等. 551.单词 ibm 的匹配文本: The Basic Edition includes HDFS, Hbase, MapReduce, Hive, Mahout, Oozie, Pig, ZooKeeper, Hue, and several other open source tools, as well as a basic version of the IBM

  • lucene3 use regular expressions to query the index 2010-05-28

    lucene3 use regular expressions to query the index mainly uses RegexQuery class that belongs to third-party jar package to download. To use third put the package is jakata-regexp-x.jar, Download Address:

  • nutch related configuration in the Nutch-defaul.xml 2010-11-28

    Nutch-default.XML related configuration <property> <name> http.max.delays </ name> <value> 20 </ value> <description> The number of times a thread will delay when trying to fetch a page. Each time it finds that a host i

  • Lucene 3.6 contrib 学习总结 2012-04-20

    analyzers : 各种分词器 以国家.语言.功能进行分类 grouping : 分组统计 ,可以对各Field进行几个级别的搜索结果中field值的统计 highlighter : 高亮,搜索结果Document中的空间向量文本进行高性能高亮 icu: 开始不知道它是干嘛呢,既然有分词的实现咋不放入 analyzers包中呢,详细的看后, 发现它的定义.提供标准国际化编码UTF-8支持. 实用价值不大. instantiated :

  • [Transfer] kinds of common based on the Lucene open source search solutions compared 2010-12-18

    Original Address: A direct use of Lucene ( ) Description: Lucene search is a JAVA class library, which itself is not a complete solution requires additional

  • Hibernate Search, Lucene 2009-04-06

    First of all, we need to be added to the configuration in the persistence.xml as follows: <! - Use a file system based index -> <property name="" value="

  • Lucene in Action (Simplified Chinese) 2010-03-29

    A total of 10 part of the first part of the Lucene core 1. Contact Lucene 2. Index 3. To add a search procedure 4. Analysis of 5. High-pole search technology 6. Extended Search application of the second part of the Lucene 7. Analysis of commonly used

  • Hibernate Search, Lucene and JBoss Seam 2010-03-29

    First, we need to add the relevant configuration in persistence.xml as follows: <!-- use a file system based index --> <property name="" value="

  • Lucene study conclude by 4: Lucene indexing process analysis (1) 2010-03-29

    Lucene index for the process, apart from the word (Term) writes inversion table and eventually into Lucene's index file, but also including sub-word (Analyzer) and merging paragraphs (merge segments) of the process, this does not include the two part

  • Lucene study conclude by 4: Lucene indexing process analysis (2) 2010-03-29

    3, the document by adding IndexWriter Code: writer.addDocument (doc); -> IndexWriter.addDocument (Document doc, Analyzer analyzer) -> doFlush = docWriter.addDocument (doc, analyzer); -> DocumentsWriter.updateDocument (Document, Analyzer, Term) No

  • Lucene study conclude by 4: Lucene indexing process analysis (4) 2010-03-29

    6, close the IndexWriter object Code: writer.close (); -> IndexWriter.closeInternal (boolean) - "(1) to index information from the memory is written to disk: flush (waitForMerges, true, true); - "(2) in paragraph merge: mergeScheduler.merge (

  • Lucene: full-text search engine based on Java Introduction 2010-03-29

    Author: Che Dong Published on :2002-08-06 18:08 Last updated :2009-03-20 23:03 Copyright : You can willfully, reproduced hyperlink when you make sure to indicate the form of the article Original Source And author information and This statement . http

  • lucene in Field.Index, Field.Store, Field.TermVector Xiangjie 2010-03-29

    lucene at doc.add (new Field ( "content", curArt.getContent (), Field.Store.NO, Field.Index.TOKENIZED)); Field has two optional attributes: storage and indexing. By storing attribute you can control whether to store the Field; By the index attri

  • lucene 3.0 Study Notes (1) - index 2009-09-25

    Are learning lucene, download the new version is 3.0, where the study notes in order and put in as a backup blog. Use lucene as a search engine, the main two things to do are: 1, indexing; 2, using the index query. That lucene first to search the con

  • Lucene study conclude by 6: Lucene scoring the mathematical derivation of the formula 2009-03-28

    During the process of parsing Lucene search until there is a need for a separate derivation of the Lucene score formula, each part of the meaning of elaborate on that. Because Lucene search process, a very important step is the calculation of the var