lucene 分词器

首页 > 代码库 > lucene 分词器

2024-07-08 20:00:31 229人阅读

分词器
作用：切分关键词的。
在什么地方使用到了：在建立索引和搜索时。

原文：An IndexWriter creates and maintains an index.
1，切分：
An
IndexWriter
creates
and
maintains
an
index
.
2，去除停用词
IndexWriter
creates
maintains
index
3，转为小写
indexwriter
creates
maintains
index

 1 package cn.itcast.e_analyzer; 2  3 import java.io.StringReader; 4  5 import org.apache.lucene.analysis.Analyzer; 6 import org.apache.lucene.analysis.TokenStream; 7 import org.apache.lucene.analysis.cjk.CJKAnalyzer; 8 import org.apache.lucene.analysis.cn.ChineseAnalyzer; 9 import org.apache.lucene.analysis.standard.StandardAnalyzer;10 import org.apache.lucene.analysis.tokenattributes.TermAttribute;11 import org.apache.lucene.util.Version;12 import org.junit.Test;13 import org.wltea.analyzer.lucene.IKAnalyzer;14 15 public class TestAnalyzer {16 17     @Test18     public void test() throws Exception {19         String enText = "An IndexWriter creates and maintains an index.";20         Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);21         testAnalyzer(analyzer, enText);22         23         String cnText = "传智播客准备Lucene的开发环境";24         testAnalyzer(analyzer, cnText); // 单字分词25         26         testAnalyzer(new ChineseAnalyzer(), cnText); // 单字分词27         testAnalyzer(new CJKAnalyzer(Version.LUCENE_30), cnText); // 二分法分词28         testAnalyzer(new IKAnalyzer(), cnText); // 词库分词（重点）29     }30 31     /**32      * 使用指定的分词器对指定的文本进行分词，并打印出分出的词33      * 34      * @param analyzer35      * @param text36      * @throws Exception37      */38     private void testAnalyzer(Analyzer analyzer, String text) throws Exception {39         System.out.println("当前使用的分词器：" + analyzer.getClass().getSimpleName());40         TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(text));41         tokenStream.addAttribute(TermAttribute.class);42         while (tokenStream.incrementToken()) {43             TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);44             //分词条件45             System.out.println(termAttribute.term());46         }47         System.out.println();48     }49 50 }

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > lucene 分词器

lucene 分词器

看完仍有疑问？有类似问题直接问程序猿