[Lucene3.0 the first glimpse of] the index to create (6): closing IndexWriter

2010-04-23  来源:本站原创  分类:Internet  人气:232 

1.5 IndexWriter closure details

IndexWriter index create memory index of the overall process of the previous articles has been described in detail, when we create a complete memory using IndexWriter index table, the rest of the work will only close the IndexWriter the. IndexWriter is closed when the object in memory in addition to cleaning, there is a very important job, is to the memory to store the information (need to preserve the Fields of information, the inverted index table, etc.) to write the disk Lucene index file. On the Lucene index files for each disk will come with a special series of elaborate, only to find out where to write files in the process.

◆ IndexWriter. Optimize ()

In the "Index to create (1): IndexWriter indexer" 1.1 code to create the index, when all of the Document object indexing, we will call IndexWriter.optimize (); to turn off the indexer. Called the source code of the process:

IndexWriter.optimize ()

---> IndexWriter.optimize (boolean doWait)

---> IndexWriter.optimize (int maxNumSegments, boolean doWait)

public void optimize(int maxNumSegments, boolean doWait){
     flush(true, false, true);  // Will index information from memory is written to disk files

IndexWriter.flush (boolean triggerMerge, boolean flushDocStores, boolean flushDeletes)

-> IndexWriter.doFlush (boolean flushDocStores, boolean flushDeletes)

-> IndexWriter.doFlushInternal (boolean flushDocStores, boolean flushDeletes)

private synchronized final boolean doFlushInternal(boolean flushDocStores, boolean flushDeletes) {

      // The number of documents to index
      final int numDocs = docWriter.getNumDocsInRAM();
      // Storage domains and words vectors to the section name to write to  ,"_0"
      String docStoreSegment = docWriter.getDocStoreSegment();
      // Storage domains and you want to write the word vector segments offset
      int docStoreOffset = docWriter.getDocStoreOffset();
      // Whether to use a compound index file storage
      boolean docStoreIsCompoundFile = false;
      // Get the section name to write to  :"0"
      String segment = docWriter.getSegment();

      // Start the cached index information to write to section
      flushedDocCount = docWriter.flush(flushDocStores);

◆ DocumentsWriter. Flush ()

Then the above source, IndexWriter calls DocumentsWriter the flush () method to further close the work completed. Like when the DocumentsWriter create the index is called an index chain (processing plant) to complete the building work at the same time closed-chain DocumentsWriter will gradually shut down by the index and create index produced by each step of the specified disk write information to the appropriate file.

Let's look at DocumentsWriter.flush (boolean closeDocStore) main tasks:

1, closeDocStore (); in accordance with the basic index chain closed storage area and the word vector information

The main chain structure is based on the basic index, closed storage area and the word vector information

consumer (DocFieldProcessor). closeDocStore (flushState);

consumer (DocInverter). closeDocStore (state);

consumer (TermsHash). closeDocStore (state);

consumer (FreqProxTermsWriter). closeDocStore (state);

if (nextTermsHash! = null) nextTermsHash.closeDocStore (state);

consumer (TermVectorsTermsWriter). closeDocStore (state);

endConsumer (NormsWriter). closeDocStore (state);

fieldsWriter (StoredFieldsWriter). closeDocStore (state);

Real significance of which is the following two closeDocStore:

(1) The term vector close: TermVectorsTermsWriter.closeDocStore (SegmentWriteState)

void closeDocStore(final SegmentWriteState state) throws IOException {
                   if (tvx != null) {
            // For the do not save the document in the word vector tvd file write zero  . Even if you do not save the word vector, in  tvx, tvd Also keep a location in the
            fill(state.numDocsInStore - docWriter.getDocStoreOffset());
            // Close the tvx, tvf, tvd file writes to the stream
            tvx = null;
            // Writes a record to the file name, as later build  cfs File, these files will be written to produce a unified  cfs File  .
            state.flushedFiles.add(state.docStoreSegmentName + "." + IndexFileNames.VECTORS_INDEX_EXTENSION);
            state.flushedFiles.add(state.docStoreSegmentName + "." + IndexFileNames.VECTORS_FIELDS_EXTENSION);
            state.flushedFiles.add(state.docStoreSegmentName + "." + IndexFileNames.VECTORS_DOCUMENTS_EXTENSION);
            // Member variables from DocumentsWriter  openFiles In the future may be removed,  IndexFileDeleter Delete
            docWriter.removeOpenFile(state.docStoreSegmentName + "." + IndexFileNames.VECTORS_INDEX_EXTENSION);
            docWriter.removeOpenFile(state.docStoreSegmentName + "." + IndexFileNames.VECTORS_FIELDS_EXTENSION);
            docWriter.removeOpenFile(state.docStoreSegmentName + "." + IndexFileNames.VECTORS_DOCUMENTS_EXTENSION);
            lastDocID = 0;

(2) storage domain closed: StoredFieldsWriter.closeDocStore (SegmentWriteState)

public void closeDocStore(SegmentWriteState state) throws IOException {

    // Close the fdx, fdt writes to the stream  

    --> fieldsStream.close();
    --> indexStream.close();
    fieldsWriter = null;
    lastDocID = 0;

    // Writes a record to the file name
    state.flushedFiles.add(state.docStoreSegmentName + "." + IndexFileNames.FIELDS_EXTENSION);
    state.flushedFiles.add(state.docStoreSegmentName + "." + IndexFileNames.FIELDS_INDEX_EXTENSION);
    state.docWriter.removeOpenFile(state.docStoreSegmentName + "." + IndexFileNames.FIELDS_EXTENSION);
    state.docWriter.removeOpenFile(state.docStoreSegmentName + "." + IndexFileNames.FIELDS_INDEX_EXTENSION);

2, consumer.flush (threads, flushState); in accordance with the basic index structure to index the results chain, write to the specified section of the disk file name

Call to order its written to disk as follows:

Step One: First call DocFieldProcessor.flush (Collection <DocConsumerPerThread> threads, SegmentWriteState state) will write Fields information. Fdx and. Fdt

public void flush(Collection<DocConsumerPerThread> threads, SegmentWriteState state){
    // Recycling fieldHash, for the next round of index  , For efficiency, the index of the object in the chain is to be reused  .
    Map<DocFieldConsumerPerThread, Collection<DocFieldConsumerPerField>> childThreadsAndFields = new HashMap<DocFieldConsumerPerThread, Collection<DocFieldConsumerPerField>>();
    for ( DocConsumerPerThread thread : threads) {
      DocFieldProcessorPerThread perThread = (DocFieldProcessorPerThread) thread;
      childThreadsAndFields.put(perThread.consumer, perThread.fields());
    // Writing information to the Fields  .fdx And .fdt files
    // Call index chain DocInverter the second step of the index table is written to disk files
    consumer.flush(childThreadsAndFields, state);

    // Write domain metadata information, and record-on-write filename  , For later generate cfs files
    final String fileName = state.segmentFileName(IndexFileNames.FIELD_INFOS_EXTENSION);
    fieldInfos.write(state.directory, fileName);

In this step, the main function is to call StoredFieldsWriter the Document object stored in the Field information required to be written. Fdx and. Fdt file "index file format (3): Field data [.fdx / .fdt / .fnm]". The call flow is as follows:

StoredFieldsWriter.flush (SegmentWriteState state)

---> FieldsWriter.flush ()

void flush() throws IOException {
      indexStream.flush(); // Write to .fdx file
      fieldsStream.flush(); // Write to .fdt file

---> BufferedIndexOutput.flush ()

---> BufferedIndexOutput.flushBuffer (byte [] b, int len)

---> SimpleFSDirectory.flushBuffer (byte [] b, int offset, int size)

public void flushBuffer(byte[] b, int offset, int size) throws IOException {
      file.write(b, offset, size); //JDK RandomAccessFile.write

The second step