首页 > 代码库 > sphinx配置增量索引和索引合并

sphinx配置增量索引和索引合并

 

配置增量索引

1,配置csft.conf文件。

  其中base为父类,scr1和tmp_src1都是他的子类,相应配置如下。

searchd{
        listen = 9312
        listen = 9306:mysql41
        read_timeout =5
        max_children = 30
        max_matches = 1000
        seamless_rotate = 0
        preopen_indexes = 0
        unlink_old = 1
        pid_file = /usr/local/coreseek/var/log/searchd.pid
        log = /usr/local/coreseek/var/log/searchd.log
        query_log = /usr/local/coreseek/var/log/query.log
        binlog_path = 
}
#全局配置
source base
{
        type         = mysql
        sql_host     = 127.0.0.1
        sql_user     = root
        sql_pass    = 
        sql_db         = test
        sql_port     = 3306
        sql_query_pre = SET NAMES utf8
        sql_query        =
}

source src1: base
{
        sql_query            = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents
        sql_attr_uint        = id
        sql_attr_uint        = group_id
        sql_attr_timestamp     = date_added
        sql_field_string    = title
        sql_field_string    = content
        sql_query_info_pre    = SET NAMES utf8

        
}

index src1{
        source = src1
        path = /usr/local/coreseek/var/data/test1
        docinfo = extern
        mlock =0
        morphology = none
        min_word_len =1
        html_strip =0
        #index_sp          = 1
        charset_type = zh_cn.utf-8
        charset_dictpath = /usr/local/mmseg3/etc/
}

source tmp_src1 : base
{
        sql_query            = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
        sql_attr_uint        = id
        sql_attr_uint        = group_id
        sql_attr_timestamp     = date_added
        sql_field_string    = title
        sql_field_string    = content  
}

index tmp_src1{
        source = tmp_src1        
        path = /usr/local/coreseek/var/data/tmp_src1        
        docinfo = extern
        mlock =0
        morphology = none
        min_word_len =1
        html_strip =0
        charset_type = zh_cn.utf-8
        charset_dictpath = /usr/local/mmseg3/etc/
}

 

 

对应的sql文件为:

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for documents
-- ----------------------------
DROP TABLE IF EXISTS `documents`;
CREATE TABLE `documents` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `group_id` int(11) NOT NULL,
  `date_added` datetime NOT NULL,
  `title` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `content` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4;

-- ----------------------------
-- Records of documents
-- ----------------------------
INSERT INTO `documents` VALUES (‘1‘, ‘1‘, ‘2017-06-06 21:45:58‘, ‘中国‘, ‘中国真美,地大物博‘);
INSERT INTO `documents` VALUES (‘2‘, ‘1‘, ‘2017-06-06 21:45:58‘, ‘中国美食‘, ‘台北小吃,各地美食‘);
INSERT INTO `documents` VALUES (‘3‘, ‘2‘, ‘2017-06-06 21:45:58‘, ‘美女之家‘, ‘美女之国‘);
INSERT INTO `documents` VALUES (‘4‘, ‘2‘, ‘2017-06-06 21:45:58‘, ‘hello‘, ‘this is to test groups‘);
INSERT INTO `documents` VALUES (‘5‘, ‘3‘, ‘2017-07-27 17:00:09‘, ‘熊猫‘, ‘中国国宝‘);
INSERT INTO `documents` VALUES (‘6‘, ‘3‘, ‘2017-07-14 17:00:04‘, ‘竹子‘, ‘熊猫吃竹子‘);
INSERT INTO `documents` VALUES (‘7‘, ‘4‘, ‘2017-07-14 17:30:36‘, ‘猫科动物‘, ‘老虎吃人‘);
INSERT INTO `documents` VALUES (‘8‘, ‘4‘, ‘2017-07-14 17:30:36‘, ‘猫科动物2‘, ‘东北虎‘);
INSERT INTO `documents` VALUES (‘9‘, ‘5‘, ‘2017-07-14 17:34:24‘, ‘动物园‘, ‘老鼠‘);
SET FOREIGN_KEY_CHECKS=1;






SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for sph_counter
-- ----------------------------
DROP TABLE IF EXISTS `sph_counter`;
CREATE TABLE `sph_counter` (
  `counter_id` int(11) NOT NULL AUTO_INCREMENT,
  `max_doc_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`counter_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4;

-- ----------------------------
-- Records of sph_counter
-- ----------------------------
INSERT INTO `sph_counter` VALUES (‘1‘, ‘6‘);
SET FOREIGN_KEY_CHECKS=1;

 

 

 

其中,sph_counter表中,的max_doc_id为当前在coreseek已经存放的索引的最大值。而配置文件tmp_src1中有这样一句话

sql_query            = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )

意思是将新更新的部分加入到tmp_src1中。

索引合并

将在mysql中的最新数据加入到tmp_scr1中
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft.conf --all --rotate

 技术分享

进行索引合并

/usr/local/coreseek/bin/indexer --merge src1 tmp_src1 --merge-dst-range deleted 0 0

 技术分享

这样,新增加的数据就到了src1的索引内。

技术分享



sphinx配置增量索引和索引合并