首页 > 代码库 > HTML Strip Char Filter
HTML Strip Char Filter
The html_strip
character filter strips HTML elements from the text and replaces HTML entities with their decoded value (e.g. replacing &
with &
).
Example outputedit
POST _analyze { "tokenizer": "keyword",
"char_filter": [ "html_strip" ], "text": "<p>I'm so <b>happy</b>!</p>" }
COPY AS CURLVIEW IN CONSOLE
The |
The above example returns the term:
[ \nI‘m so happy!\n ]
The same example with the standard
tokenizer would return the following terms:
[ I‘m, so, happy ]
Configurationedit
The html_strip
character filter accepts the following parameter:
|
An array of HTML tags which should not be stripped from the original text. |
Example configurationedit
In this example, we configure the html_strip
character filter to leave <b>
tags in place:
PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "keyword", "char_filter": ["my_char_filter"] } }, "char_filter": { "my_char_filter": { "type": "html_strip", "escaped_tags": ["b"] } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "<p>I'm so <b>happy</b>!</p>" }
COPY AS CURLVIEW IN CONSOLE
The above example produces the following term:
[ \nI‘m so <b>happy</b>!\n ]
源文:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html#analysis-htmlstrip-charfilter
HTML Strip Char Filter
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。