首页 > 代码库 > 【手把手教你全文检索】Lucene索引的【增、删、改、查】
【手把手教你全文检索】Lucene索引的【增、删、改、查】
前言
搞检索的,应该多少都会了解Lucene一些,它开源而且简单上手,官方API足够编写些小DEMO。并且根据倒排索引,实现快速检索。本文就简单的实现增量添加索引,删除索引,通过关键字查询,以及更新索引等操作。
目前博猪使用的不爽的地方就是,读取文件内容进行全文检索时,需要自己编写读取过程(这个solr免费帮我们实现)。而且创建索引的过程比较慢,还有很大的优化空间,这个就要细心下来研究了。
创建索引
Lucene在进行创建索引时,根据前面一篇博客,已经讲完了大体的流程,这里再简单说下:
1 Directory directory = FSDirectory.open("/tmp/testindex");2 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);3 IndexWriter iwriter = new IndexWriter(directory, config);4 Document doc = new Document();5 String text = "This is the text to be indexed.";6 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();
1 创建Directory,获取索引目录
2 创建词法分析器,创建IndexWriter对象
3 创建document对象,存储数据
4 关闭IndexWriter,提交
1 /** 2 * 建立索引 3 * 4 * @param args 5 */ 6 public static void index() throws Exception { 7 8 String text1 = "hello,man!"; 9 String text2 = "goodbye,man!";10 String text3 = "hello,woman!";11 String text4 = "goodbye,woman!";12 13 Date date1 = new Date();14 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);15 directory = FSDirectory.open(new File(INDEX_DIR));16 17 IndexWriterConfig config = new IndexWriterConfig(18 Version.LUCENE_CURRENT, analyzer);19 indexWriter = new IndexWriter(directory, config);20 21 Document doc1 = new Document();22 doc1.add(new TextField("filename", "text1", Store.YES));23 doc1.add(new TextField("content", text1, Store.YES));24 indexWriter.addDocument(doc1);25 26 Document doc2 = new Document();27 doc2.add(new TextField("filename", "text2", Store.YES));28 doc2.add(new TextField("content", text2, Store.YES));29 indexWriter.addDocument(doc2);30 31 Document doc3 = new Document();32 doc3.add(new TextField("filename", "text3", Store.YES));33 doc3.add(new TextField("content", text3, Store.YES));34 indexWriter.addDocument(doc3);35 36 Document doc4 = new Document();37 doc4.add(new TextField("filename", "text4", Store.YES));38 doc4.add(new TextField("content", text4, Store.YES));39 indexWriter.addDocument(doc4);40 41 indexWriter.commit();42 indexWriter.close();43 44 Date date2 = new Date();45 System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");46 }
增量添加索引
Lucene拥有增量添加索引的功能,在不会影响之前的索引情况下,添加索引,它会在何时的时机,自动合并索引文件。
1 /** 2 * 增加索引 3 * 4 * @throws Exception 5 */ 6 public static void insert() throws Exception { 7 String text5 = "hello,goodbye,man,woman"; 8 Date date1 = new Date(); 9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10 directory = FSDirectory.open(new File(INDEX_DIR));11 12 IndexWriterConfig config = new IndexWriterConfig(13 Version.LUCENE_CURRENT, analyzer);14 indexWriter = new IndexWriter(directory, config);15 16 Document doc1 = new Document();17 doc1.add(new TextField("filename", "text5", Store.YES));18 doc1.add(new TextField("content", text5, Store.YES));19 indexWriter.addDocument(doc1);20 21 indexWriter.commit();22 indexWriter.close();23 24 Date date2 = new Date();25 System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");26 }
删除索引
Lucene也是通过IndexWriter调用它的delete方法,来删除索引。我们可以通过关键字,删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档,那么最好就顶一个唯一的ID域,通过这个ID域,来进行删除操作。
1 /** 2 * 删除索引 3 * 4 * @param str 删除的关键字 5 * @throws Exception 6 */ 7 public static void delete(String str) throws Exception { 8 Date date1 = new Date(); 9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10 directory = FSDirectory.open(new File(INDEX_DIR));11 12 IndexWriterConfig config = new IndexWriterConfig(13 Version.LUCENE_CURRENT, analyzer);14 indexWriter = new IndexWriter(directory, config);15 16 indexWriter.deleteDocuments(new Term("filename",str)); 17 18 indexWriter.close();19 20 Date date2 = new Date();21 System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");22 }
更新索引
Lucene没有真正的更新操作,通过某个fieldname,可以更新这个域对应的索引,但是实质上,它是先删除索引,再重新建立的。
1 /** 2 * 更新索引 3 * 4 * @throws Exception 5 */ 6 public static void update() throws Exception { 7 String text1 = "update,hello,man!"; 8 Date date1 = new Date(); 9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10 directory = FSDirectory.open(new File(INDEX_DIR));11 12 IndexWriterConfig config = new IndexWriterConfig(13 Version.LUCENE_CURRENT, analyzer);14 indexWriter = new IndexWriter(directory, config);15 16 Document doc1 = new Document();17 doc1.add(new TextField("filename", "text1", Store.YES));18 doc1.add(new TextField("content", text1, Store.YES));19 20 indexWriter.updateDocument(new Term("filename","text1"), doc1);21 22 indexWriter.close();23 24 Date date2 = new Date();25 System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");26 }
通过索引查询关键字
Lucene的查询方式有很多种,这里就不做详细介绍了。它会返回一个ScoreDoc的集合,类似ResultSet的集合,我们可以通过域名获取想要获取的内容。
1 /** 2 * 关键字查询 3 * 4 * @param str 5 * @throws Exception 6 */ 7 public static void search(String str) throws Exception { 8 directory = FSDirectory.open(new File(INDEX_DIR)); 9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10 DirectoryReader ireader = DirectoryReader.open(directory);11 IndexSearcher isearcher = new IndexSearcher(ireader);12 13 QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);14 Query query = parser.parse(str);15 16 ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;17 for (int i = 0; i < hits.length; i++) {18 Document hitDoc = isearcher.doc(hits[i].doc);19 System.out.println(hitDoc.get("filename"));20 System.out.println(hitDoc.get("content"));21 }22 ireader.close();23 directory.close();24 }
全部代码
1 package test; 2 3 import java.io.File; 4 import java.util.Date; 5 import java.util.List; 6 7 import org.apache.lucene.analysis.Analyzer; 8 import org.apache.lucene.analysis.standard.StandardAnalyzer; 9 import org.apache.lucene.document.Document; 10 import org.apache.lucene.document.LongField; 11 import org.apache.lucene.document.TextField; 12 import org.apache.lucene.document.Field.Store; 13 import org.apache.lucene.index.DirectoryReader; 14 import org.apache.lucene.index.IndexWriter; 15 import org.apache.lucene.index.IndexWriterConfig; 16 import org.apache.lucene.index.Term; 17 import org.apache.lucene.queryparser.classic.QueryParser; 18 import org.apache.lucene.search.IndexSearcher; 19 import org.apache.lucene.search.Query; 20 import org.apache.lucene.search.ScoreDoc; 21 import org.apache.lucene.store.Directory; 22 import org.apache.lucene.store.FSDirectory; 23 import org.apache.lucene.util.Version; 24 25 public class TestLucene { 26 // 保存路径 27 private static String INDEX_DIR = "D:\\luceneIndex"; 28 private static Analyzer analyzer = null; 29 private static Directory directory = null; 30 private static IndexWriter indexWriter = null; 31 32 public static void main(String[] args) { 33 try { 34 // index(); 35 search("man"); 36 // insert(); 37 // delete("text5"); 38 // update(); 39 } catch (Exception e) { 40 e.printStackTrace(); 41 } 42 } 43 /** 44 * 更新索引 45 * 46 * @throws Exception 47 */ 48 public static void update() throws Exception { 49 String text1 = "update,hello,man!"; 50 Date date1 = new Date(); 51 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); 52 directory = FSDirectory.open(new File(INDEX_DIR)); 53 54 IndexWriterConfig config = new IndexWriterConfig( 55 Version.LUCENE_CURRENT, analyzer); 56 indexWriter = new IndexWriter(directory, config); 57 58 Document doc1 = new Document(); 59 doc1.add(new TextField("filename", "text1", Store.YES)); 60 doc1.add(new TextField("content", text1, Store.YES)); 61 62 indexWriter.updateDocument(new Term("filename","text1"), doc1); 63 64 indexWriter.close(); 65 66 Date date2 = new Date(); 67 System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); 68 } 69 /** 70 * 删除索引 71 * 72 * @param str 删除的关键字 73 * @throws Exception 74 */ 75 public static void delete(String str) throws Exception { 76 Date date1 = new Date(); 77 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); 78 directory = FSDirectory.open(new File(INDEX_DIR)); 79 80 IndexWriterConfig config = new IndexWriterConfig( 81 Version.LUCENE_CURRENT, analyzer); 82 indexWriter = new IndexWriter(directory, config); 83 84 indexWriter.deleteDocuments(new Term("filename",str)); 85 86 indexWriter.close(); 87 88 Date date2 = new Date(); 89 System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); 90 } 91 /** 92 * 增加索引 93 * 94 * @throws Exception 95 */ 96 public static void insert() throws Exception { 97 String text5 = "hello,goodbye,man,woman"; 98 Date date1 = new Date(); 99 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);100 directory = FSDirectory.open(new File(INDEX_DIR));101 102 IndexWriterConfig config = new IndexWriterConfig(103 Version.LUCENE_CURRENT, analyzer);104 indexWriter = new IndexWriter(directory, config);105 106 Document doc1 = new Document();107 doc1.add(new TextField("filename", "text5", Store.YES));108 doc1.add(new TextField("content", text5, Store.YES));109 indexWriter.addDocument(doc1);110 111 indexWriter.commit();112 indexWriter.close();113 114 Date date2 = new Date();115 System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");116 }117 /**118 * 建立索引119 * 120 * @param args121 */122 public static void index() throws Exception {123 124 String text1 = "hello,man!";125 String text2 = "goodbye,man!";126 String text3 = "hello,woman!";127 String text4 = "goodbye,woman!";128 129 Date date1 = new Date();130 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);131 directory = FSDirectory.open(new File(INDEX_DIR));132 133 IndexWriterConfig config = new IndexWriterConfig(134 Version.LUCENE_CURRENT, analyzer);135 indexWriter = new IndexWriter(directory, config);136 137 Document doc1 = new Document();138 doc1.add(new TextField("filename", "text1", Store.YES));139 doc1.add(new TextField("content", text1, Store.YES));140 indexWriter.addDocument(doc1);141 142 Document doc2 = new Document();143 doc2.add(new TextField("filename", "text2", Store.YES));144 doc2.add(new TextField("content", text2, Store.YES));145 indexWriter.addDocument(doc2);146 147 Document doc3 = new Document();148 doc3.add(new TextField("filename", "text3", Store.YES));149 doc3.add(new TextField("content", text3, Store.YES));150 indexWriter.addDocument(doc3);151 152 Document doc4 = new Document();153 doc4.add(new TextField("filename", "text4", Store.YES));154 doc4.add(new TextField("content", text4, Store.YES));155 indexWriter.addDocument(doc4);156 157 indexWriter.commit();158 indexWriter.close();159 160 Date date2 = new Date();161 System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");162 }163 164 /**165 * 关键字查询166 * 167 * @param str168 * @throws Exception169 */170 public static void search(String str) throws Exception {171 directory = FSDirectory.open(new File(INDEX_DIR));172 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);173 DirectoryReader ireader = DirectoryReader.open(directory);174 IndexSearcher isearcher = new IndexSearcher(ireader);175 176 QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);177 Query query = parser.parse(str);178 179 ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;180 for (int i = 0; i < hits.length; i++) {181 Document hitDoc = isearcher.doc(hits[i].doc);182 System.out.println(hitDoc.get("filename"));183 System.out.println(hitDoc.get("content"));184 }185 ireader.close();186 directory.close();187 }188 }
参考资料
http://www.cnblogs.com/xing901022/p/3933675.html
【手把手教你全文检索】Lucene索引的【增、删、改、查】