# Summary of the seven learning Lucene: Lucene search process analysis (6) transfer

2010-06-08  来源：本站原创  分类：Java  人气：127

### 2.4, the search query object

#### 2.4.3, to merge the inverted form

Has been the object in the tree and SumScorer Scorer object tree, then it is inverted and the merger of the table scoring process of the calculation.

Merge the inverted table analysis in this section, the Scorer object tree for the calculation of rate analysis in the next section.

BooleanScorer2.score (Collector) code is as follows:

 public void score (Collector collector) throws IOException ( collector.setScorer (this); while ((doc = countingSumScorer.nextDoc ())! = NO_MORE_DOCS) ( collector.collect (doc); ) )

We can see from the code, the process is ongoing to remove a document number, then add the document result set.

The process to remove a document, that is, the process of merging inverted form, which is on the integrated consideration of multiple query a document after the next number.

As SumScorer is a tree, thus merging the inverted table is carried out in accordance with the structure of the tree, the first merger of sub-tree, and then sub-tree subtree merge again, until the root.

Analysis in the previous section, the inverted form of combined major with the following SumScorer:

• Intersection ConjunctionScorer
• And set DisjunctionSumScorer
• Difference set ReqExclScorer
• ReqOptSumScorer

Here we are 11 analysis:

##### 2.4.3.1, intersection ConjunctionScorer (+ A + B)

ConjunctionScorer in member variables Scorer [] scorers, a Scorer array, each of which represents an inverted form, ConjunctionScorer is inverted on the table to take the intersection, then the intersection of the document number in nextDoc () function in order to return .

In order to describe clearly the process, the following give a concrete example to explain the process of merging inverted form:

(1) the original inverted list as follows:

(2) ConjunctionScorer constructor, first of all call each Scorer of nextDoc () function, so that each Scorer get his first chapter document number.

 for (int i = 0; i

(3) ConjunctionScorer the constructor in the Scorer in accordance with the first document to be ranked number from small to large.

 Arrays.sort (scorers, new Comparator () ( public int compare (Scorer o1, Scorer o2) ( return o1.docID () - o2.docID (); ) ));

Inverted table is as follows:

(4) ConjunctionScorer the constructor, the first call doNext () function.

 if (doNext () == NO_MORE_DOCS) ( lastDoc = NO_MORE_DOCS; return; ) private int doNext () throws IOException ( int first = 0; int doc = scorers [scorers.length - 1]. docID (); Scorer firstScorer; while ((firstScorer = scorers [first]). docID ()

No harm to have the smallest document we call the inverted table number is called first, in fact from doNext () function in the first = first == scorers.length - 1? 0: first + 1; we can see that, in the process, Scorer array is seen as a loop array (Ring).

The time scorer [scorers.length - 1] has the largest document number, doNext () in the loop, less than the current array of all the documents in the largest number of documents all with firstScorer.advance (doc) (the big jump than or equal to doc document) function to skip, because since they are less than the maximum number of documents, and ConjunctionScorer is intersected, they are certainly not in the intersection.

This process is as follows:

• doc = 8, first point to the first 0, advance to the first document is greater than 8, namely the document 10, then set doc = 10, first point to the first one.

• doc = 10, first point to the first one, advance to the document 11, then set doc = 11, first point No. 2.

• doc = 11, first point to the first two, advance to the document 11, then set doc = 11, first points to Section 3.

• doc = 11, first point to the first three, advance to the document 11, then set doc = 11, first point to Section 4.

• doc = 11, first point to the first four, advance to the document 11, then set doc = 11, first point to item 5.

• doc = 11, first point to item 5, advance to the document 11, then set doc = 11, first point No. 6.

• doc = 11, first point to the first six, advance to the document 11, then set doc = 11, first point to item 7.

• doc = 11, first point to item 7, advance to the document 11, then set doc = 11, first point to the first 0.

• doc = 11, first point to the first 0, advance to the document 11, then set doc = 11, first point to the first one.

• doc = 11, first point to the first one. Because 11 <11 as false, and thus end the cycle, return doc = 11. This time we will find out in the loop when all inverted the first document table is 11.

(5) When BooleanScorer2.score (Collector) in the first call ConjunctionScorer.nextDoc () time, lastDoc to -1, to achieve the function according to nextDoc return lastDoc = scorers [scorers.length - 1]. DocID () that is back to 11, lastDoc also set to 11.

 public int nextDoc () throws IOException ( if (lastDoc == NO_MORE_DOCS) ( return lastDoc; ) Else if (lastDoc == -1) ( return lastDoc = scorers [scorers.length - 1]. docID (); ) scorers [(scorers.length - 1)]. nextDoc (); return lastDoc = doNext (); )

(6) BooleanScorer2.score (Collector), the call nextDoc () later, collector.collect (doc) to collect the document number (collection process of the next section), in the process of collecting documents, ConjunctionScorer.docID () will be call, return lastDoc, that is the current document number 11.

(7) When BooleanScorer2.score (Collector) second call ConjunctionScorer.nextDoc () when:

• According to nextDoc function to achieve, first call scorers [(scorers.length - 1)]. NextDoc (), taking a document under the last item 13.

• Then call lastDoc = doNext (), set doc = 13, first = 0, into the circulation.
• doc = 13, first point to the first 0, advance to the document 13, then set doc = 13, first point to the first one.

• doc = 13, first point to the first one, advance to the document 13, then set doc = 13, first point No. 2.

• doc = 13, first point to the first two, advance to the document 13, then set doc = 13, first points to Section 3.

• doc = 13, first point to the first three, advance to the document 13, then set doc = 13, first point to Section 4.

• doc = 13, first point to the first four, advance to the document 13, then set doc = 13, first point to item 5.

• doc = 13, first point to item 5, advance to the document 13, then set doc = 13, first point No. 6.

• doc = 13, first point to the first six, advance to the document 13, then set doc = 13, first point to item 7.

• doc = 13, first point to item 7, advance to the document 13, then set doc = 13, first point to the first 0.

• doc = 13, first point to the first 0. Because 13 <13 is false, and thus end the cycle, return doc = 13. When the loop exits, all inverted the first document table is 13.

(8) lastDoc set to 13, in the process of collecting documents, ConjunctionScorer.docID () is called, return lastDoc, that is the current document number 13.

(9) When another call nextDoc () when the return NO_MORE_DOCS, inverted the end of the table combined.

Transfer: http://forfuture1978.javaeye.com/blog/632859

• Summary of the seven learning Lucene: Lucene search process analysis (4) transfer 2010-06-08

2.4, the search query object 2.4.1.2, create a Weight object tree BooleanQuery.createWeight (Searcher) eventual return return new BooleanWeight (searcher), BooleanWeight concrete implementation of the constructor as follows: public BooleanWeight (Sea

• Summary of the seven learning Lucene: Lucene search process analysis (5) transfer 2010-06-08

2.4, the search query object 2.4.2, create a Scorer and SumScorer object tree Weight when you create the object tree, when called IndexSearcher.search (Weight, Filter, int), the code is as follows: / / (A) create a document collector No. TopScoreDocC

• Summary of the seven learning Lucene: Lucene search process analysis (6) transfer 2010-06-08

2.4, the search query object 2.4.3, to merge the inverted form Has been the object in the tree and SumScorer Scorer object tree, then it is inverted and the merger of the table scoring process of the calculation. Merge the inverted table analysis in

• Summary of the seven learning Lucene: Lucene search process analysis (7) transfer 2010-06-08

2.4, the search query object 2.4.3.2, and set DisjunctionSumScorer (A OR B) DisjunctionSumScorer in member variables List <Scorer> subScorers, a Scorer's list, each of which represents a inverted table, DisjunctionSumScorer is inverted on the table

• Summary of the seven learning Lucene: Lucene search process analysis 2010-04-04

This series of articles will detail the latest version of Lucene is almost the basic principles and code analysis. Which the overall structure and Lucene 2.9 index file format is, the index is a Lucene 3.0 for process analysis. In view of the index f

• Summary of the seven learning Lucene: Lucene search process analysis (2) 2010-04-04

Second, Lucene search detailed process In order to resolve Lucene index file on the search process, several pre-written index of the following documents: file01.txt: apple apples cat dog file02.txt: apple boy cat category file03.txt: apply dog eat et

• Summary of the seven learning Lucene: Lucene search process analysis (1) 2010-04-04

1, Lucene search process General remarks The process is to search the whole dictionary and inverted index table information is read out from under the query entered by the user combined inverted table for the result set of documents and document the

• Summary of the seven learning Lucene: Lucene search process analysis (3) 2010-04-04

2.3, QueryParser parsing query generation query object Code: QueryParser parser = new QueryParser (Version.LUCENE_CURRENT, "contents", new StandardAnalyzer (Version.LUCENE_CURRENT)); Query query = parser.parse ("+(+ apple *-boy) (cat * dog)

• Summary of the seven learning Lucene: Lucene search process analysis (8) 2010-04-04

2.4, the search query object 2.4.4, the document collection and calculation of the result set rate In the function IndexSearcher.search (Weight, Filter, int) in the following code: TopScoreDocCollector collector = TopScoreDocCollector.create (nDocs,!

• Summary of the seven learning Lucene: Lucene search process analysis (8) turn 2010-06-08

2.4, the search query object 2.4.4, the document collection and calculation of the result set rate In the function IndexSearcher.search (Weight, Filter, int) in the following code: TopScoreDocCollector collector = TopScoreDocCollector.create (nDocs,!

• Summary of the seven learning Lucene: Lucene search process analysis (4) 2010-04-04

2.4, the search query object 2.4.1.2, create a Weight object tree BooleanQuery.createWeight (Searcher) eventual return return new BooleanWeight (searcher), BooleanWeight concrete implementation of the constructor as follows: public BooleanWeight (Sea

• Summary of the seven learning Lucene: Lucene search process analysis (6) 2010-04-04

2.4, the search query object 2.4.3, to merge the inverted form Has been the object in the tree and SumScorer Scorer object tree, then it is inverted and the merger of the table scoring process of the calculation. Merge the inverted table analysis in

• Summary of the seven learning Lucene: Lucene search process analysis (7) 2010-04-04

2.4, the search query object 2.4.3.2, and set DisjunctionSumScorer (A OR B) DisjunctionSumScorer in member variables List <Scorer> subScorers, a Scorer's list, each of which represents a inverted table, DisjunctionSumScorer is inverted on the table

• Lucene: full-text search engine based on Java Introduction 2010-03-29

Author: Che Dong Published on :2002-08-06 18:08 Last updated :2009-03-20 23:03 Copyright : You can willfully, reproduced hyperlink when you make sure to indicate the form of the article Original Source And author information and This statement . http

• Using the Apache Lucene text search 2008-08-15

Introduction Lucene is an open source, highly scalable search engine library, you can get from the Apache Software Foundation. You can use Lucene for commercial and open source applications. Lucene powerful API focuses on text indexing and search. It

• Call use the PHP package to implement Lucene full-text search 2010-11-27

[Search] PHP calls by Lucene full-text search package to achieve -------------------------------------------------- ------------------------------ http://www.chinaunix.net of: z33 Posted :2006 -06-24 17:43:53 [Comments] [] [Read Php forum] [Close] /

• Details of the use and optimization of Xiangjie lucene.lucene.NET 2010-04-02

1 lucene Introduction 1.1 What is lucene Lucene is a full-text search framework, rather than applications. So it does not like www.baidu.com or google Desktop can then be used to use, it only provides a tool for you to achieve these products. 1.2 luc

• .NET 版的 Lucene Lucene.Net 2011-10-08

Lucene.Net 网站 : http://lucenenet.apache.org/ Lucene.Net 是 .NET 版的 Lucene. Lucene.Net 命名空间分析 Lucene.Net.Documents 这个命名空间提供了一些为封装要索引的文档所需要的类,比如 Document, Field.这样,每一个文档最终被封装成了一个 Document 对象. Lucene.Net.Analysis 这个命名空间主要功能是对文档进行分词,因为文档在建立索引之前必须要进行分词,所以这

• Lucene study conclude by 5: Lucene segment merging (merge) process analysis 2008-05-23

1, Duan course of the merger remarks IndexWriter with merger-related segment member variables are: HashSet <SegmentInfo> mergingSegments = new HashSet <SegmentInfo> (); / / save the segment being merged in order to prevent the merger has been

• Nutch Search Engine Analysis 2010-06-09

Quote 1, system architecture Nutch generally can be divided into two parts: Part crawl and search section. Capture process and to crawl back Crawl page of data made of inverted index, the search process is on the inverted index search to answer the u

1
2
3
4
5
6
7
8
9
10