Using postgreSQL + bamboo structures lucene convenient than N times the full-text search

2010-11-12  来源:本站原创  分类:Database  人气:116 

All packages are used to:

cmake-2.6.4.tar.gz (Code nlpbamboo use)

CRF + +-0.53.tar.gz (ibid.)

nlpbamboo-1.1.1.tar.bz2 (word use)

postgreSQL-8.3.3.tar.gz (index used)

Installing pgsql

tar-zxvf postgreSQL-8.3.3.tar.gz

cd postgre-8.3.3

. / Configure-prefix = / opt / pgsql

make
make install

useradd postgre

chown-R postgre.postgre / opt / pgsql
su - postgre
vi ~ postgre / .bash_profile
Add
export PATH
PGLIB = / opt / pgsql / lib
PGDATA = / data / PGSearch
PATH = $ PATH: / opt / pgsql / bin
MANPATH = $ MANPATH: / opt / pgsql / man
export PGLIB PGDATA PATH MANPATH

# Mkdir-p / data / PGSearch

# Chown-R postgre.postgre / data / PGSearch

# Chown-R postgre.postgre / opt / pgsql

# Sudo-u postgre / opt / pgsql / bin / initdb-locale = zh_CN.UTF-8-encoding = utf8-D / data / PGSearch

# Sudo-u postgre / opt / pgsql / bin / postmaster-i-D / data / PGSearch & / / allow network access

# Sudo-u postgre / opt / pgsql / bin / createdb kxgroup
# Vim / data / PGSearch / pg_hba.conf to increase access to the machine as follows:

host all all 10.2.19.178 255.255.255.0 trust

# Su - postgre

$ Pg_ctl stop

$ Postmaster-i-D / data / PGSearch &
Install Chinese word (Cmake CRF + + bamboo)
Cmake to build bamboo, CRF + + is a bamboo-dependent.

tar-zxvf cmake-2.6.4.tar.gz

cd cmake-2.6.4
. / Configure
gmake
make install

tar-zxvf CRF + +-0.53.tar.gz
cd CRF + + -0.53
. / Configure
make
make install

tar-jxvf nlpbamboo-1.1.1.tar.bz2
cd nlpbamboo-1.1.1
mkdir build
cd build /
cmake ..-DCMAKE_BUILD_TYPE = release
make all
make install

cp index.tar.bz2 / opt / bamboo /
cd / opt / bamboo /
tar-jxvf index.tar.bz2

# / Opt / bamboo / bin / bamboo

If:

ERROR: libcrfpp.so.0: cannot open shared object file: No such file or directory

On the implementation of:

ln-s / usr / local / lib / libcrfpp.so .* / usr / lib /
ldconfig

Increase on the Chinese word extended to pgsql

# Vim / root / .bash_profile also increased:

PGLIB = / opt / pgsql / lib
PGDATA = / data / PGSearch
PATH = $ PATH: / opt / pgsql / bin
MANPATH = $ MANPATH: / opt / pgsql / man
export PGLIB PGDATA PATH MANPATH

# Source ~ /. Bash_profile

cd / opt / bamboo / exts / postgres / chinese_parser /
make
make install

su - postgre
cd / opt / pgsql / share / contrib /
touch / opt/pgsql/share/tsearch_data/chinese_utf8.stop
psql kxgroup
\ I chinese_parser.sql import

And then execute the following sql, it has a word can be a:

SELECT to_tsvector ('chinesecfg', 'the results of the implementation of bamboo in the command line to know');

First here, about the next part of the index and the TEXT field inquiries, complete construct an entire search engine.

First, the Basics

The return from a sql begin:

select * from dbname where field_name @ @ 'aa | bb' order by rank (field_name, 'aa | bb');

Explain the literal meaning of this sql: check this table from the dbname field_name match aa or bb's word, and in accordance with their matching RANK order.

After basically understand the above paragraph, to learn four concepts: tsvector, tsquery, @ @, gin.

1. Tsvector:

In postgreSQL 8.3 comes with support for full-text search, in the previous version need to install the configuration tsearch2 to use. It provides two data types (tsvector, tsquery), and natural language documents retrieved by the dynamic collection, navigate to the best match results, tsvector is one of them.

The value of a tsvector classification is the only word list, then a word to a different format for the entry, word processing during the time, tsvector word will automatically remove duplicate entries, according to some order into the . Such as

SELECT 'a fat cat sat on a mat and ate a fat rat':: tsvector;
tsvector
------------------
'A' 'on' 'and' 'ate' 'cat' 'fat' 'mat' 'rat' 'sat'

Tsvector according to a string by a space word, this word can appear after the word according to the number of times in a row (also by word length).

For English and Chinese full-text search of this we depend on the following sql:

SELECT to_tsvector ('english', 'The Fat Rats');
to_tsvector
------
'Fat': 2 'rat': 3

to_tsvector tsvector function is normalized, which can be used to specify the word.

2. Tsquery:

As the name suggests, tsquery, that should be related to the query. Tsquery entry is stored for retrieval. And can be combined using boolean operators to connect, & (AND), | (OR), and! (NOT). The use of parentheses (), can be forced into a group.

Meanwhile, tsquery doing search, you can use weights, and each word can use one or more weight markers, so that retrieval time, will match the same weight information with the above tsvector same, tsquery also has a to_tsquery function.

3 @ @:

Match in the full-text search operation in postgresql using the @ @ operator, if a
tsvector (document) matches tsquery (query) returns true.

Look at a simple example:

SELECT 'a fat cat sat on a mat and ate a fat rat':: tsvector @ @ 'cat & rat':: tsquery;
? Column?
----
t
When we deal with the index or to use their functions are as follows:
SELECT to_tsvector ('fat cats ate fat rats') @ @ to_tsquery ('fat & rat');
? Column?
----
t
And the operator can use the text as the @ @ tsvector and tsquery. As the operator can make use of the method

tsvector @ @ tsquery
tsquery @ @ tsvector
text @ @ tsquery
text @ @ text
Above the first two we have used, but the latter two,
text @ @ tsquery equivalent to_tsvector (x) @ @ y.
text @ @ text equivalent to_tsvector (x) @ @ plainto_tsquery (y ).(~) plainto_tsquery later say. . .

4.gin:

gin is an index of names, with the full-text index.

Gin we can create an index to speed up the retrieval speed, for example

CREATE INDEX pgweb_idx ON pgweb USING gin (to_tsvector ('english', body));

Creating an index can have a variety of ways. Index creation can even connect the two columns:
CREATE INDEX pgweb_idx ON pgweb USING gin (to_tsvector ('english', title | | body));

Second, to improve articles

Basic knowledge of science is over, should battle, in order to achieve the full-text search, we need to format a document creates a tsvector and tsquery achieved through the user's query, the query in order of importance, we return a query result.

Look at a to_tsquery the sql:

SELECT to_tsquery ('english', 'Fat | Rats: AB');
to_tsquery
------
'Fat' | 'rat': AB

It can be seen, to_tsquery function when processing the query text, query text to a single word to be used between the logical operators (& (AND), | (OR) and! (NOT)) connection (or use the brackets).

If you do the following clause in sql error occurs:

SELECT to_tsquery ('english', 'Fat Rats');

plainto_tsquery function is to provide a standard tsquery, such as the above example, plainto_tsquery will automatically add the logical operator &.
SELECT plainto_tsquery ('english', 'Fat Rats');

plainto_tsquery
------
'Fat' & 'rat'
But plainto_tsquery function can not recognize the logical operators and weight markers.
SELECT plainto_tsquery ('english', 'The Fat & Rats: C');
plainto_tsquery
-------
'Fat' & 'rat' & 'c'

Third, the finale

After reading a bunch of the above, a thousand words merged into one sentence, this article is mainly about a sql, in Canada the first part described the extension, use the following sql, from a field in the search word, but also Sort out:

select * from tabname where to_tsvector ('chinesecfg', textname) @ @ plainto_tsquery ('search Diansha') order by ts_rank (to_tsvector ('chinesecfg', textname), plainto_tsquery ('search Diansha')) limit 10;

Before the create table create index not write. Give a man a fish is the key.

相关文章
  • Using postgreSQL + bamboo structures lucene convenient than N times the full-text search 2010-11-12

    All packages are used to: cmake-2.6.4.tar.gz (Code nlpbamboo use) CRF + +-0.53.tar.gz (ibid.) nlpbamboo-1.1.1.tar.bz2 (word use) postgreSQL-8.3.3.tar.gz (index used) Installing pgsql tar-zxvf postgreSQL-8.3.3.tar.gz cd postgre-8.3.3 . / Configure-pre

  • Using postgreSQL + bamboo structures to facilitate more than N times the lucene full text search 2010-11-12

    All packages are used to: cmake-2.6.4.tar.gz (Code nlpbamboo use) CRF + +-0.53.tar.gz (ibid.) nlpbamboo-1.1.1.tar.bz2 (sub-word is used) postgreSQL-8.3.3.tar.gz (index used) Installing pgsql tar-zxvf postgreSQL-8.3.3.tar.gz cd postgre-8.3.3 . / Confi

  • SSH + Lucene + page + sort + highlight a simple news website search engine simulation 2010-10-31

    SSH + Lucene + page + sort + highlight a simple news website search engine simulation http://www.javaeye.com/topic/414477

  • PostgreSQL 8.3.1 Full Text Search (Full Text Search) 2010-11-12

    Transfer from: http://www.blogjava.net/agun/archive/2008/04/23/195086.html In postgreSQL 8.3 comes with support for full-text search functions in the previous version needs to install and configure tsearch2 to use, safety switch configuration tsearch

  • Lucene full text search and the first application 2011-01-12

    Unstructured data: Zhibu Ding length or no fixed format data, such as email, word documents, etc.; Full Text Search: A popular name called unstructured data and text data. Data from the full-text retrieval is called full-text search. Search features:

  • lucene - Full Text Search 2011-04-30

    Before talking about full-text search, start with the following information retrieval. Information Retrieval popular speaking, from the collection of information the user find the relevant information, in addition to text, there are audio, image and

  • The expansion of full-text index PostgreSql Bamboo 2010-11-12

    http://code.google.com/p/nlpbamboo/

  • Using the Apache Lucene text search 2008-08-15

    Introduction Lucene is an open source, highly scalable search engine library, you can get from the Apache Software Foundation. You can use Lucene for commercial and open source applications. Lucene powerful API focuses on text indexing and search. It

  • Lucene in Action (Simplified Chinese) 2010-03-29

    A total of 10 part of the first part of the Lucene core 1. Contact Lucene 2. Index 3. To add a search procedure 4. Analysis of 5. High-pole search technology 6. Extended Search application of the second part of the Lucene 7. Analysis of commonly used

  • Basic use Lucene Introduction 2010-03-30

    Basic use Lucene Introduction Purpose of this paper is not on the Lucene concept and design of these were introduced, only to introduce how to use Lucene to achieve the kind you want search the full text of several common needs, if you want to unders

  • [Information retrieval] Lucene (1): the basic principle of Search 2010-03-04

    The full text: http://blog.csdn.net/forfuture1978/archive/2009/10/22/4711308.aspx Study: forfuture1978 (from CSDN) We all know, Lucene is an open source text search engine tool kit. Search in the end then what? This should start with the data in our

  • Details of the use and optimization of Xiangjie lucene.lucene.NET 2010-04-02

    1 lucene Introduction 1.1 What is lucene Lucene is a full-text search framework, rather than applications. So it does not like www.baidu.com or google Desktop can then be used to use, it only provides a tool for you to achieve these products. 1.2 luc

  • One study concluded Lucene: the basic principle of Search 2010-06-24

    1, General in accordance http://lucene.apache.org/java/docs/index.html definition: Lucene is an efficient, full-text search library based on Java. Therefore, to understand the Lucene fee prior to work to find out some full-text search. So, what is ca

  • Web search using Lucene to accelerate application development (to) 2010-11-15

    In this article, you will learn how to use Lucene for advanced search functions and how to use Lucene to create a Web search application. Through these lessons, you can use Lucene to create your own search applications. Architecture Overview Usually

  • Introduction and how to use Lucene? 2011-05-23

    Lucene is a Java-based toolkit full-text index. The full text index engine based on Lucene Java Introduction: On the history of the author and Lucene Full text search implementation: Luene full-text index and database comparison of the index Chinese

  • Web search using Lucene to accelerate application development (transfer) 2010-11-15

    In this article, you will learn how to use Lucene for advanced search functions and how to use Lucene to create a Web search application. Through these learning, you can use Lucene to create your own search applications. Architecture Overview Usually

  • Web search using Lucene to speed up application development 2009-06-19

    This article is taken from: http://www.ibm.com/developerworks/cn/web/wa-lucene2/ Lucene Java-based text information retrieval package, which is currently the Apache Jakarta family following an open source project. In this article, we first look at ho

  • Lucene: full-text search engine based on Java Introduction 2010-03-29

    Author: Che Dong Published on :2002-08-06 18:08 Last updated :2009-03-20 23:03 Copyright : You can willfully, reproduced hyperlink when you make sure to indicate the form of the article Original Source And author information and This statement . http

  • Lucene's Helloworld 2010-03-29

    Lucene is not a complete search engine, does not have reptiles features, management interface, like the feature that part of the project to achieve the site's search engine, Nutch is one, based on Lucene search engine applications to achieve.. This r

  • I understand the principles of the lucene (primary) 2009-08-26

    1 beginning with the first piece of code, analyzing the simplest lucene code of Hello World package cn.itcast.lesson; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.do