首页 > 代码库 > ElasticSearch学习
ElasticSearch学习
前言:ES学习可参考 《Elasticsearch: 权威指南》,这个在线电子书内容介绍的很是详细
官网 https://www.elastic.co/
中文社区 http://elasticsearch.cn/
目录
- 安装
- 安装插件
- 配置
- CURL命令
- JAVA API
- 其他
安装
首先我们需要去官网下载安装包 官方下载地址
解压后结构是这样的(2.5以上版本会有plugins目录,没有的需要手动创建)
创建一个es用户(因为es不允许使用root用户启动)
useradd es
将该目录权限修改为es用户所有
chown es:es -hR .
所有要作为es节点的机器都要执行以上操作
安装插件
ES的插件都是要安装到 es安装目录/plugins/ 下
1.elasticsearch-head
这是一个elasticsearch的集群管理工具,它是完全由HTML5编写的独立网页程序,通过这个插件可以可视化监控ES。
官网:https://github.com/mobz/elasticsearch-head
2.中文分词器 ik
官网 https://github.com/medcl/elasticsearch-analysis-ik
下载源码之后进行解压
然后用maven编译
maven package
成功后安装包在target/releases/elasticsearch-analysis-ik-x.x.x.zip
将这个压缩包解压到es的插件的对应目录下即可(plugins/ik)。
最后 重启ES集群
3.elasticsearch-analysis-pinyin 分词器
4.nGram
我们用ik分词器的时候,检索的时候会把搜索词进行分词然后检索。如
搜索 “我们的生活”,优先是包含这5个字的,但是也会返回包含“我们”和“生活”的数据。
但是有时候我们不需要这么智能,只需要完全匹配的进行搜索。这就需要用到ngram了。(不需要单独安装,只需要设置settings即可)
先上一个例子
POST url : localhost:9200/ngramtestContent-Type: application/json{ "settings": { "analysis": { "analyzer": { "charSplit": { "type": "custom", "tokenizer": "my_ngram_tokenizer", "filter":["lowercase"] } }, "tokenizer": { "my_ngram_tokenizer": { "type": "nGram", "min_gram": "2", "max_gram": "4", "token_chars": ["letter","digit","punctuation"] } } } }, "mappings": { "myType": { "dynamic": "strict", "properties": { "content": { "type": "string", "analyzer": "charSplit", "search_analyzer": "charSplit" } } } }}
属性settings.analysis.tokenizer下面的 my_ngram_tokenizer 对象是自定义的tokenizer
settings.analysis.analyzer.charSplit 则是基于 my_ngram_tokenizer 的自定义分词器
关于my_ngram_tokenizer 中的属性:
min_gram:单个词的最小长度,默认1max_gram:单个词的最大长度,默认2token_chars:可以接受的字符集(即遇到不在列表中的字符集会进行文本分割)字符集包括letter 字母或汉字 a, b, ï or 京digit 数字 3 or 7whitespace 空白(空格、回车、tab等) " " or "\n"punctuation 标点符号 ! , 。or "symbol 标志(区别于标点符号) $ or √
可以从下面的例子了解一下
配置片段 "token_chars": ["letter","digit","punctuation"]
即接收文字数字和标点,那现在我在内容中添加symbol标记 $
POST 192.168.5.222:9200/yuqingtest/_analyze?pretty&analyzer=charSplit商业核心和$标准化技术
返回结果
{ "tokens": [ { "token": "商业核", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "商业核心", "start_offset": 0, "end_offset": 4, "type": "word", "position": 1 }, { "token": "业核心", "start_offset": 1, "end_offset": 4, "type": "word", "position": 2 }, { "token": "业核心和", "start_offset": 1, "end_offset": 5, "type": "word", "position": 3 }, { "token": "核心和", "start_offset": 2, "end_offset": 5, "type": "word", "position": 4 }, { "token": "标准化", "start_offset": 6, "end_offset": 9, "type": "word", "position": 5 }, { "token": "标准化技", "start_offset": 6, "end_offset": 10, "type": "word", "position": 6 }, { "token": "准化技", "start_offset": 7, "end_offset": 10, "type": "word", "position": 7 }, { "token": "准化技术", "start_offset": 7, "end_offset": 11, "type": "word", "position": 8 }, { "token": "化技术", "start_offset": 8, "end_offset": 11, "type": "word", "position": 9 } ]}
可以看到$分割开了左右的词
配置
配置文件只需要改动config/elasticsearch.yml 的3个地方即可
...cluster.name: my-es-cluster...node.name: node1...network.host: 192.168.245.139...
要注意的是 yml类型的配置文件 冒号后面必须要有一个空格 否则读取的时候会认为格式不正确
启动
#先进入ES安装路径su es //切换到之前创建的es用户bin/elasticsearch#bin/elasticsearch -d(也可以后台运行)
在浏览器上输入 http://<IP>:9200/
{ "name" : "myhost", "cluster_name" : "my-es-cluster", "cluster_uuid" : "UZHnaRT7R06kBjKh6Qbzvg", "version" : { "number" : "2.4.2", "build_hash" : "161c65a337d4b422ac0c805f284565cf2014bb84", "build_timestamp" : "2017-03-17T11:51:03Z", "build_snapshot" : false, "lucene_version" : "5.5.2" }, "tagline" : "You Know, for Search"}
看到以上结构内容则表明安装配置成功
CURL命令
index
#创建indexcurl -XPUT http://192.168.5.222:9200/index_name/#删除indexcurl -XDELETE http://192.168.5.222:9200/index_name/#查看indexcurl -XGET http://192.168.5.222:9200/index_name/
type
#新增/更新Type(不在url的最后指定id的话,es会自动生成id)curl -XPOST http://192.168.5.222:9200/index_name/emp/1 -d ‘{"first_name" : "John","age" : 25,"about" : "I love to go rock climbing","interests": ["sports","music"]}‘#根据ID删除curl -XDELETE http://192.168.5.222:9200/index_name/emp/1#检索Typecurl -XGET http://192.168.5.222:9200/index_name/emp/1?pretty#查询所有字段curl –XGET http://192.168.5.222:9200/index_name/emp/1/_source#只返回部分字段curl -XGET http://192.168.5.222:9200/index_name/emp/1?_source=name,age#返回所有数据curl -XGET http://192.168.5.222:9200/index_name/emp/_search#简单的条件查询curl -XGET http://192.168.5.222:9200/index_name/emp/_search?q=first_name:Smith#条件删除curl -XDELETE ‘http://localhost:9200/index_name/emp,user/_query?q=user:kimchy‘#查看分词情况curl -XPOST http://192.168.5.222:9200/index_name/_analyze?pretty&analyzer=charSplit -d ‘商业核心和$标准化技术‘
type的复杂查询(DSL),这种查询同时支持GET和POST,不过使用CURL命令来POST数据太不直观,我都是使用Postman
#新增type
POST 192.168.5.222:9200/yuqingtest/article/Content-Type: application/json{ "title" : "政协副主席建议提高境外黑匣子", "content" : "使用了商业核心和$标准化技术,相比以前的非标$准化方案,更容易维护和支持哈哈有个黑匣子在外面。"}
#查询(查询相关语句太多)
{ "query": { "multi_match": { "query": "黑匣子", "type": "phrase", "slop": 1, "fields": [ "content" ], "max_expansions": 1 } }, "highlight" : { "pre_tags" : ["<tag1>", "<tag2>"], "post_tags" : ["</tag1>", "</tag2>"], "fields" : { "content" : {} } }, "sort":{ "createTime":{"order":"esc"} }}
JAVA API
ES官方提供的Javaapi用起来不是很方便(org.elasticsearch.elasticsearch)
用spring的封装版就好得多(org.springframework.data.spring-data-elasticsearch),尤其是结合springboot后,精简了配置等相关操作,开发效率更是提升
pom的依赖以及配置参考 Springboot结合elasticsearch,下面只看重点
略过ArticleEntity
Repo
public interface ArticleRepository extends ElasticsearchRepository<ArticleEntity, String> {}
增删改查例子
package com.ray.estest;import java.util.ArrayList;import java.util.List;import java.util.Map;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.index.query.BoolQueryBuilder;import org.elasticsearch.index.query.QueryBuilders;import org.elasticsearch.index.query.RangeQueryBuilder;import org.elasticsearch.search.SearchHit;import org.elasticsearch.search.highlight.HighlightBuilder.Field;import org.elasticsearch.search.highlight.HighlightField;import org.elasticsearch.search.sort.SortBuilder;import org.elasticsearch.search.sort.SortBuilders;import org.elasticsearch.search.sort.SortOrder;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.data.domain.Page;import org.springframework.data.domain.PageRequest;import org.springframework.data.domain.Pageable;import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;import org.springframework.data.elasticsearch.core.SearchResultMapper;import org.springframework.data.elasticsearch.core.aggregation.AggregatedPage;import org.springframework.data.elasticsearch.core.aggregation.impl.AggregatedPageImpl;import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;import org.springframework.data.elasticsearch.core.query.SearchQuery;import com.product.yq_common.utils.StringUtils;import com.product.yq_service.entity.input.ArticleDetailInput;import com.product.yq_service.entity.input.ArticleQueryEntity;import com.product.yq_service.entity.output.ArticleSummaryInfoEntity;import com.product.yq_serviceimpl.entity.ArticleEntity;import com.product.yq_serviceimpl.repo.ArticleRepository;/** * @author Ray * 2017年3月30日 */public class ArticleServiceImpl { @Autowired private ArticleRepository repo; @Autowired private ElasticsearchTemplate elasticsearchTemplate; public Object getArticles(ArticleQueryEntity entity) throws Exception { List<ArticleSummaryInfoEntity> articles = new ArrayList<ArticleSummaryInfoEntity>(); // 分页 Pageable pager = new PageRequest(0, 10); // 构建查询语句 BoolQueryBuilder qb = QueryBuilders.boolQuery().must(QueryBuilders.termQuery("deleted", false)) .must(QueryBuilders.termQuery("name", "Zhang"))// term一般用于not_analyzed .must(QueryBuilders.matchQuery("favorite", entity.getType()));// match则用于analyzed // 拼接条件 if (!StringUtils.isEmpty(entity.getSearchWord())) { // multiMatchQuery 混合查询 同时检索多个字段 qb = qb.must(QueryBuilders.multiMatchQuery(entity.getSearchWord(), "title", "summary")); } if (!StringUtils.isEmpty(entity.getStartDate()) || !StringUtils.isEmpty(entity.getEndDate())) { // 区间查询 gt,lt,gte,lte,from-to, RangeQueryBuilder rqb = QueryBuilders.rangeQuery("createDate"); if (!StringUtils.isEmpty(entity.getStartDate())) { rqb = rqb.gte(entity.getStartDate()); } if (!StringUtils.isEmpty(entity.getEndDate())) { rqb = rqb.lte(entity.getEndDate()); } qb = qb.must(rqb); } // 排序(最好不要用字符串类型的Field做排序) SortBuilder sort = SortBuilders.fieldSort("createTime").order(SortOrder.DESC); // 开始组装 SearchQuery query = new NativeSearchQueryBuilder().withQuery(qb).withSort(sort).withPageable(pager).build(); // 返回的是带有分页数据的对象 Page<ArticleEntity> entities = repo.search(query); long total = entities.getTotalElements(); int pages = entities.getTotalPages(); List<ArticleEntity> rst = entities.getContent(); return rst; } /** * 更新 */ public void update(ArticleDetailInput entity) throws Exception { ArticleEntity oriES = repo.findOne(entity.getArticleId()); oriES.setTitle(entity.getTitle()); oriES.setContent(entity.getContent()); repo.save(oriES); } /** * 添加 */ public String add(ArticleDetailInput entity) throws Exception { ArticleEntity oriES = new ArticleEntity(); oriES.setTitle("这是title"); oriES.setContent("这是content"); oriES = repo.save(oriES); return oriES.getArticleId();// articleId映射ES中的ID } /** * 获取详情 */ public Object detail(String id) throws Exception { return repo.findOne(id); } /** * 删除 */ public void delete(String id) throws Exception { repo.delete(id); } /** * 根据关键词进行搜索并返回高亮内容 */ public Object searchByWords(String word) throws Exception { List<ArticleSummaryInfoEntity> articles = new ArrayList<ArticleSummaryInfoEntity>(); Pageable pager = new PageRequest(0, 10); // 构建查询语句 BoolQueryBuilder qb = QueryBuilders.boolQuery().must(QueryBuilders.termQuery("deleted", false)) .must(QueryBuilders.multiMatchQuery(word, "title", "content")); String preTags = "<span class=‘highlight‘>"; String postTags = "</span>"; // 设置要高亮的字段,高亮的前后标签,高亮内容的截取长度 Field fTitle = new Field("title").preTags(preTags).postTags(postTags).fragmentSize(100); Field fContent = new Field("content").preTags(preTags).postTags(postTags).fragmentSize(100); SearchQuery query = new NativeSearchQueryBuilder().withQuery(qb).withPageable(pager) .withHighlightFields(fTitle, fContent).build(); elasticsearchTemplate.queryForPage(query, ArticleEntity.class, new SearchResultMapper() { @SuppressWarnings("unchecked") @Override public <T> AggregatedPage<T> mapResults(SearchResponse response, Class<T> clazz, Pageable pageable) { // 总个数 long total = response.getHits().getTotalHits(); // 总页数 int pages = (int) Math.ceil((double) total / pager.getPageSize()); if (response.getHits().getTotalHits() <= 0) { return null; } for (SearchHit searchHit : response.getHits()) { ArticleSummaryInfoEntity item = new ArticleSummaryInfoEntity(); articles.add(item); Map<String, Object> source = searchHit.getSource(); item.setArticleId(source.get("articleId").toString()); Map<String, HighlightField> highlightFields = searchHit.getHighlightFields(); // 查看高亮字段是否命中 HighlightField hlTitleField = highlightFields.get("title"); if (hlTitleField != null && hlTitleField.fragments() != null) { item.setTitle((hlTitleField.fragments()[0].string())); } else { item.setTitle((String) source.get("title")); } HighlightField hlContentField = highlightFields.get("content"); if (hlContentField != null && hlContentField.fragments() != null) { item.setSummary(hlContentField.fragments()[0].string()); } else { item.setSummary((String) source.get("summary")); } } return new AggregatedPageImpl<T>((List<T>) articles); } }); return articles; }}
如果使用ngram让部分字段实现完全匹配查询,除了要设置要mappings,java代码中也会有点小改动:给QueryBuilder设置slop和type
......// 构建查询语句BoolQueryBuilder qb = QueryBuilders.boolQuery().must(QueryBuilders.multiMatchQuery(search, "content").slop(1).type(Type.PHRASE));......
其他
ES的文件存储结构与hadoop十分类似,二者可以搭配使用,详情参阅https://www.elastic.co/products/hadoop
参考:
- http://www.programcreek.com/java-api-examples/index.php?api=org.springframework.data.elasticsearch.core.query.SearchQuery
- elasticsearch系列
ElasticSearch学习