首页 > 代码库 > lucene 索引流程整理笔记

lucene 索引流程整理笔记

索引的原文档(Document)。

为了方便说明索引创建过程,这里特意用两个文件为例:

文件一:Students should be allowed to go out with their friends, but not allowed to drink beer.

文件二:My friend Jerry went to school to see his students but found them drunk which is not allowed.

 

 

结果处的索引文件:

 

  • Document Frequency 即文档频率,表示总共有多少篇文档包含此词(Term)。
  • Frequency 即词频率,表示每篇文档里面包含了几个词(Term)。
  • 左边是词典,右边是倒排表