python html parse

首页 > 代码库 > python html parse

2024-10-29 22:41:02 211人阅读

bs4:转换成unicode编码，http://www.crummy.com/software/BeautifulSoup/

from bs4 import BeautifulSoupsoup = BeautifulSoup(open("index.html"))soup = BeautifulSoup("<html>data</html>")

Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种: Tag , NavigableString ,BeautifulSoup , Comment .

from bs4 import SoupStraineronly_a_tags = SoupStrainer("a")only_tags_with_id_link2 = SoupStrainer(id="link2")def is_short_string(string):    return len(string) < 10only_short_strings = SoupStrainer(text=is_short_string)

BeautifulSoup(html_doc, "html.parser", parse_only=only_a_tags)

lxml: python 对 libxml 的包装
html5lib：纯python实现

python html parse

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > python html parse

python html parse

看完仍有疑问？有类似问题直接问程序猿