首页 > 代码库 > 为推文优化的Lucene Analyzer类
为推文优化的Lucene Analyzer类
<strong><span style="font-size:18px;">/*** * @author YangXin * @info 使用Doublemetaphone函数对Twitter优化。 * Doublemetaphone函数能够为发音类似的单词创建同样的键 * */ package unitTwelve; import java.io.IOException; import org.apache.commons.codec.language.DoubleMetaphone; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.en.PorterStemFilter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.Version; public class TwitterAnalyzer extends Analyzer{ private DoubleMetaphone filter = new DoubleMetaphone(); public TokenStream result = new PorterStemFilter(new StopFilter(true, new StandardTokenizer(Version.LUCENE_CURRENT, reader), StandardAnalyzer.STOP_WORDS_SET)); TermAttribute termAtt = (TermAttribute) result.addAttribute(TermAttribute.class); StringBuilder buf = new StringBuilder(); try{ while(result.incrementToken()){ String word = new String(termAtt.term(), 0, termAtt.termLength()); buf.append(filter.encode(filter.encode(word)).append(" ")); } }catch(IOException e){ e.printStackTrace(); } return new WhitespaceTokenizer(new StringReader(buf.toString())); } }</span></strong>
为推文优化的Lucene Analyzer类
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。