首页 > 代码库 > 【手把手教你全文检索】Lucene索引的【增、删、改、查】

【手把手教你全文检索】Lucene索引的【增、删、改、查】

前言

  搞检索的,应该多少都会了解Lucene一些,它开源而且简单上手,官方API足够编写些小DEMO。并且根据倒排索引,实现快速检索。本文就简单的实现增量添加索引,删除索引,通过关键字查询,以及更新索引等操作。

  目前博猪使用的不爽的地方就是,读取文件内容进行全文检索时,需要自己编写读取过程(这个solr免费帮我们实现)。而且创建索引的过程比较慢,还有很大的优化空间,这个就要细心下来研究了。

  创建索引

  Lucene在进行创建索引时,根据前面一篇博客,已经讲完了大体的流程,这里再简单说下:

1 Directory directory = FSDirectory.open("/tmp/testindex");2 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);3 IndexWriter iwriter = new IndexWriter(directory, config);4 Document doc = new Document();5 String text = "This is the text to be indexed.";6 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();

  1 创建Directory,获取索引目录

  2 创建词法分析器,创建IndexWriter对象

  3 创建document对象,存储数据

  4 关闭IndexWriter,提交

 1 /** 2      * 建立索引 3      *  4      * @param args 5      */ 6     public static void index() throws Exception { 7          8         String text1 = "hello,man!"; 9         String text2 = "goodbye,man!";10         String text3 = "hello,woman!";11         String text4 = "goodbye,woman!";12         13         Date date1 = new Date();14         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);15         directory = FSDirectory.open(new File(INDEX_DIR));16 17         IndexWriterConfig config = new IndexWriterConfig(18                 Version.LUCENE_CURRENT, analyzer);19         indexWriter = new IndexWriter(directory, config);20 21         Document doc1 = new Document();22         doc1.add(new TextField("filename", "text1", Store.YES));23         doc1.add(new TextField("content", text1, Store.YES));24         indexWriter.addDocument(doc1);25         26         Document doc2 = new Document();27         doc2.add(new TextField("filename", "text2", Store.YES));28         doc2.add(new TextField("content", text2, Store.YES));29         indexWriter.addDocument(doc2);30         31         Document doc3 = new Document();32         doc3.add(new TextField("filename", "text3", Store.YES));33         doc3.add(new TextField("content", text3, Store.YES));34         indexWriter.addDocument(doc3);35         36         Document doc4 = new Document();37         doc4.add(new TextField("filename", "text4", Store.YES));38         doc4.add(new TextField("content", text4, Store.YES));39         indexWriter.addDocument(doc4);40         41         indexWriter.commit();42         indexWriter.close();43 44         Date date2 = new Date();45         System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");46     }

 

  增量添加索引

  Lucene拥有增量添加索引的功能,在不会影响之前的索引情况下,添加索引,它会在何时的时机,自动合并索引文件。

 1 /** 2      * 增加索引 3      *  4      * @throws Exception 5      */ 6     public static void insert() throws Exception { 7         String text5 = "hello,goodbye,man,woman"; 8         Date date1 = new Date(); 9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10         directory = FSDirectory.open(new File(INDEX_DIR));11 12         IndexWriterConfig config = new IndexWriterConfig(13                 Version.LUCENE_CURRENT, analyzer);14         indexWriter = new IndexWriter(directory, config);15 16         Document doc1 = new Document();17         doc1.add(new TextField("filename", "text5", Store.YES));18         doc1.add(new TextField("content", text5, Store.YES));19         indexWriter.addDocument(doc1);20 21         indexWriter.commit();22         indexWriter.close();23 24         Date date2 = new Date();25         System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");26     }

  

  删除索引

  Lucene也是通过IndexWriter调用它的delete方法,来删除索引。我们可以通过关键字,删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档,那么最好就顶一个唯一的ID域,通过这个ID域,来进行删除操作。

 1 /** 2      * 删除索引 3      *  4      * @param str 删除的关键字 5      * @throws Exception 6      */ 7     public static void delete(String str) throws Exception { 8         Date date1 = new Date(); 9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10         directory = FSDirectory.open(new File(INDEX_DIR));11 12         IndexWriterConfig config = new IndexWriterConfig(13                 Version.LUCENE_CURRENT, analyzer);14         indexWriter = new IndexWriter(directory, config);15         16         indexWriter.deleteDocuments(new Term("filename",str));  17         18         indexWriter.close();19         20         Date date2 = new Date();21         System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");22     }

  

  更新索引

  Lucene没有真正的更新操作,通过某个fieldname,可以更新这个域对应的索引,但是实质上,它是先删除索引,再重新建立的。

 1 /** 2      * 更新索引 3      *  4      * @throws Exception 5      */ 6     public static void update() throws Exception { 7         String text1 = "update,hello,man!"; 8         Date date1 = new Date(); 9          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10          directory = FSDirectory.open(new File(INDEX_DIR));11 12          IndexWriterConfig config = new IndexWriterConfig(13                  Version.LUCENE_CURRENT, analyzer);14          indexWriter = new IndexWriter(directory, config);15          16          Document doc1 = new Document();17         doc1.add(new TextField("filename", "text1", Store.YES));18         doc1.add(new TextField("content", text1, Store.YES));19         20         indexWriter.updateDocument(new Term("filename","text1"), doc1);21         22          indexWriter.close();23          24          Date date2 = new Date();25          System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");26     }

  

  通过索引查询关键字

  Lucene的查询方式有很多种,这里就不做详细介绍了。它会返回一个ScoreDoc的集合,类似ResultSet的集合,我们可以通过域名获取想要获取的内容。

 1 /** 2      * 关键字查询 3      *  4      * @param str 5      * @throws Exception 6      */ 7     public static void search(String str) throws Exception { 8         directory = FSDirectory.open(new File(INDEX_DIR)); 9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);10         DirectoryReader ireader = DirectoryReader.open(directory);11         IndexSearcher isearcher = new IndexSearcher(ireader);12 13         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);14         Query query = parser.parse(str);15 16         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;17         for (int i = 0; i < hits.length; i++) {18             Document hitDoc = isearcher.doc(hits[i].doc);19             System.out.println(hitDoc.get("filename"));20             System.out.println(hitDoc.get("content"));21         }22         ireader.close();23         directory.close();24     }

 

  全部代码

  1 package test;  2   3 import java.io.File;  4 import java.util.Date;  5 import java.util.List;  6   7 import org.apache.lucene.analysis.Analyzer;  8 import org.apache.lucene.analysis.standard.StandardAnalyzer;  9 import org.apache.lucene.document.Document; 10 import org.apache.lucene.document.LongField; 11 import org.apache.lucene.document.TextField; 12 import org.apache.lucene.document.Field.Store; 13 import org.apache.lucene.index.DirectoryReader; 14 import org.apache.lucene.index.IndexWriter; 15 import org.apache.lucene.index.IndexWriterConfig; 16 import org.apache.lucene.index.Term; 17 import org.apache.lucene.queryparser.classic.QueryParser; 18 import org.apache.lucene.search.IndexSearcher; 19 import org.apache.lucene.search.Query; 20 import org.apache.lucene.search.ScoreDoc; 21 import org.apache.lucene.store.Directory; 22 import org.apache.lucene.store.FSDirectory; 23 import org.apache.lucene.util.Version; 24  25 public class TestLucene { 26     // 保存路径 27     private static String INDEX_DIR = "D:\\luceneIndex"; 28     private static Analyzer analyzer = null; 29     private static Directory directory = null; 30     private static IndexWriter indexWriter = null; 31  32     public static void main(String[] args) { 33         try { 34 //            index(); 35             search("man"); 36 //            insert(); 37 //            delete("text5"); 38 //            update(); 39         } catch (Exception e) { 40             e.printStackTrace(); 41         } 42     } 43     /** 44      * 更新索引 45      *  46      * @throws Exception 47      */ 48     public static void update() throws Exception { 49         String text1 = "update,hello,man!"; 50         Date date1 = new Date(); 51          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); 52          directory = FSDirectory.open(new File(INDEX_DIR)); 53  54          IndexWriterConfig config = new IndexWriterConfig( 55                  Version.LUCENE_CURRENT, analyzer); 56          indexWriter = new IndexWriter(directory, config); 57           58          Document doc1 = new Document(); 59         doc1.add(new TextField("filename", "text1", Store.YES)); 60         doc1.add(new TextField("content", text1, Store.YES)); 61          62         indexWriter.updateDocument(new Term("filename","text1"), doc1); 63          64          indexWriter.close(); 65           66          Date date2 = new Date(); 67          System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); 68     } 69     /** 70      * 删除索引 71      *  72      * @param str 删除的关键字 73      * @throws Exception 74      */ 75     public static void delete(String str) throws Exception { 76         Date date1 = new Date(); 77         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); 78         directory = FSDirectory.open(new File(INDEX_DIR)); 79  80         IndexWriterConfig config = new IndexWriterConfig( 81                 Version.LUCENE_CURRENT, analyzer); 82         indexWriter = new IndexWriter(directory, config); 83          84         indexWriter.deleteDocuments(new Term("filename",str));   85          86         indexWriter.close(); 87          88         Date date2 = new Date(); 89         System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); 90     } 91     /** 92      * 增加索引 93      *  94      * @throws Exception 95      */ 96     public static void insert() throws Exception { 97         String text5 = "hello,goodbye,man,woman"; 98         Date date1 = new Date(); 99         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);100         directory = FSDirectory.open(new File(INDEX_DIR));101 102         IndexWriterConfig config = new IndexWriterConfig(103                 Version.LUCENE_CURRENT, analyzer);104         indexWriter = new IndexWriter(directory, config);105 106         Document doc1 = new Document();107         doc1.add(new TextField("filename", "text5", Store.YES));108         doc1.add(new TextField("content", text5, Store.YES));109         indexWriter.addDocument(doc1);110 111         indexWriter.commit();112         indexWriter.close();113 114         Date date2 = new Date();115         System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");116     }117     /**118      * 建立索引119      * 120      * @param args121      */122     public static void index() throws Exception {123         124         String text1 = "hello,man!";125         String text2 = "goodbye,man!";126         String text3 = "hello,woman!";127         String text4 = "goodbye,woman!";128         129         Date date1 = new Date();130         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);131         directory = FSDirectory.open(new File(INDEX_DIR));132 133         IndexWriterConfig config = new IndexWriterConfig(134                 Version.LUCENE_CURRENT, analyzer);135         indexWriter = new IndexWriter(directory, config);136 137         Document doc1 = new Document();138         doc1.add(new TextField("filename", "text1", Store.YES));139         doc1.add(new TextField("content", text1, Store.YES));140         indexWriter.addDocument(doc1);141         142         Document doc2 = new Document();143         doc2.add(new TextField("filename", "text2", Store.YES));144         doc2.add(new TextField("content", text2, Store.YES));145         indexWriter.addDocument(doc2);146         147         Document doc3 = new Document();148         doc3.add(new TextField("filename", "text3", Store.YES));149         doc3.add(new TextField("content", text3, Store.YES));150         indexWriter.addDocument(doc3);151         152         Document doc4 = new Document();153         doc4.add(new TextField("filename", "text4", Store.YES));154         doc4.add(new TextField("content", text4, Store.YES));155         indexWriter.addDocument(doc4);156         157         indexWriter.commit();158         indexWriter.close();159 160         Date date2 = new Date();161         System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");162     }163 164     /**165      * 关键字查询166      * 167      * @param str168      * @throws Exception169      */170     public static void search(String str) throws Exception {171         directory = FSDirectory.open(new File(INDEX_DIR));172         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);173         DirectoryReader ireader = DirectoryReader.open(directory);174         IndexSearcher isearcher = new IndexSearcher(ireader);175 176         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);177         Query query = parser.parse(str);178 179         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;180         for (int i = 0; i < hits.length; i++) {181             Document hitDoc = isearcher.doc(hits[i].doc);182             System.out.println(hitDoc.get("filename"));183             System.out.println(hitDoc.get("content"));184         }185         ireader.close();186         directory.close();187     }188 }
View Code

 

  参考资料

  http://www.cnblogs.com/xing901022/p/3933675.html

【手把手教你全文检索】Lucene索引的【增、删、改、查】