Oraccle 全文检索技术（了解）

已索引

准备数据环境：

SQL> create table hr.objects as select * from dba_objects;
SQL> update hr.objects set object_name='cyt' where rownum=1;
SQL> connect hr/hr

普通索引，走全表扫描：

SQL> create index idx_x on objects(object_name);
SQL> select * from objects where object_name like '%cyt%';

准备全文检索环境：

SQL> sqlplus "/ as sysdba"
SQL> grant ctxapp to hr;
SQL> grant execute on ctx_ddl to hr;
SQL> alter user ctxsys  account unlock identified  by ctxsys;

SQL> connect hr/hr

SQL> Begin
ctx_ddl.drop_preference('club_lexer');  --第一次执行须要注释掉头两条
ctx_ddl.drop_preference('mywordlist');  --第一次执行须要注释掉头两条
ctx_ddl.create_preference('club_lexer','CHINESE_LEXER');
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',1);
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 5);
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end;
/

SQL> create index  id_cont_test on objects(object_name) indextype is ctxsys.context
parameters (
'DATASTORE CTXSYS.DIRECT_DATASTORE FILTER
CTXSYS.NULL_FILTER LEXER club_lexer WORDLIST mywordlist');

SQL> exec ctx_ddl.sync_index('id_cont_TEST', '20M');

使用全文检索，走 DOMAIN INDEX：

SQL> select * from objects where contains(OBJECT_NAME,'cyt')>0;

Oracle 全文检索技术的实现，是通过其词法分析器(lexer)将文章中所有的表意单元(term)找出来，记录在一组以 dr$ 开头的表中，同时记录该 term 出现的位置、次数、hash 值等信息。然后基于此进行查找匹配。

Oracle 针对不同的语言提供了不同的 lexer，我们通常能用到的是其中的三个：
basic_lexer：针对英语。根据空格和标点将单词从句子中分离，因为它只认空格和标点，基本上会把一句汉语当成一个词。
chinese_vgram_lexer：专门的汉语分析器，支持所有汉字字符集。不认识常用的汉语词汇，非常机械。
chinese_lexer：新的汉语分析器，只支持 utf8 字符集。能够认识大部分常用的汉语词汇。

另外要注意，对于全文检索，数据更新后一定要调用 ctx_ddl.sync_index 做实时同步，否则就会出现数据丢失的现象。